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ON STOCHASTIC PROCESSES IN BIOLOGY 


A. T. Rem! 
Committee on Mathematical Biology 
The University of Chicago 


The purposes of this article are: (i) to discuss the role of the theory 
of stochastic processes in the methodology of mathematical biology, 
(ii) to present a review of some work dealing with the application of 
stochastic processes in biology, and (iii) to encourage, perhaps, other 
workers to utilise the theory of stochastic processes in formulating 
mathematical models of various biological phenomena. 


I. INTRODUCTION 


Mathematical biology is concerned with the development of a 
rationale for biological phenomena. Before the researches of N. Rashev- 
sky’ (31) towards the development of a mathematical biology numerous 
studies were available in what might be termed mathematical ecology 
(Volterra, Kostitzin, Gause) and mathematical genetics (Fisher, 
Haldane, Wright). In the early days of mathematical biology the 
methods used by Rashevsky and his associates were mainly those of 
classical mathematical physics, with models of biological phenomena 
being postulated which could be treated by utilising these methods. 
It soon became evident, however, that the complex phenomena asso- 
ciated with the biological world could not be fully investigated by the 
mathematical methods of physics alone, and that new methods designed, 
perhaps, especially for biology would be required. That this situation 
might arise should come as no surprise when we reflect on the uniformity 
of physical and biological processes. In dealing with physical processes 
we have few variables between which to establish a functional rela- 
tionship. Hence in physical experiments we need only to make a few 
determinations in order to predict the outcome of future experiments. 
This is an assumption on the uniformity of physical processes, and we 
find that it is deterministic. When we consider biological processes 
we find in the main a probabilistic or stochastic uniformity, a deter- 
ministic uniformity existing for only a few phenomena. 


1U. S. Public Health Service Research Fellow. 
2Numbers in parentheses indicate Reference cited. 
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With the above in mind it is perhaps clear why the formulation of 
abstract biological models should proceed more along stochastic rather 
than deterministic lines. Consider, for a moment, what we do in a 
biological experiment: We endeavour to measure the result of a series 
of interacting phenomena which vary in space as well as in time. We 
are unable, due to our ignorance of biological organisation, to establish 
an exact relationship between the interacting elements which we have 
been able to ascertain as playing a role in the reactions leading to the 
measured result. Hence any attempts to give an exact relationship 
between biological observables are usually futile, or lead to complex 
models of the underlying mechanism which defy experimental verifica- 
tion or refutation. In developing mathematical theories in biology we 
should postulate models which answer the following sort of question: 
If observations on a biological system are taken as elements of the set 
of all possible observations, what type of model will enable us to predict 
observations that as a rule will be elements of a specified subset of the 
set of all possible observations on the system, while it is possible that 
the observations will not be elements of the subset? At present the 
theory of probability and stochastic processes is the only mathematical 
scheme suitable for answering such questions, and hence formulating 
the desired mathematical theories in biology. 


II. SOME REMARKS ON STOCHASTIC PROCESSES 


In various problems treated by probabilistic methods one finds that 
the distribution of a random variable depends on a non-random variable 
which is a continuously varying parameter such as time. In these 
cases we speak of a stochastic process. Processes may be continuous or 
discontinuous depending on the distribution of the random variable. 
The theory of stochastic processes is now highly developed (9), (20); 
and there exists a large literature on the theory itself and its various 
applications. Outstanding applications are in fields such as physics, 
biology, and economics. 

In this section we wish to make a few remarks about some of the 
processes that have been used in various biological applications. We 
will be concerned only with a brief characterisation of these processes, 
and defining some terms that will be used laier. 

1. Markov chains. Let 2% , %; , 22 , «+: be a sequence of random 
variables, each of which can take on one - the values w, , w2., °° 
Let p;, denote Prob {z, = w; | x,-; = ,}; that is p;; is the seuhabiliay 
that 2, has the value w; given that x,-, had the value w; . The values 
w, are called the slates of the system defined by the matrix of transition 
probabilities P = (p,;). A matrix P is called stochastic if all of its 
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elements are non-negative, and the row sums are equal to unity. The 
matrix P together with the initial probability distribution {a,} for 
the states w, at time zero completely defines a simple Markov chain 
(11), (13), (35). A chain is said to be finite or infinite depending on the 
number of possible states w, , and discontinuous (discrete) or continuous 
depending on whether transitions between states take place at integral 
values of time or not. 

While the characterisation of a process as a simple Markov chain 
makes available for its study a well-developed theory, it imposes the 
restriction that the previous history of the system plays no role in 
determining its future course. In contradistinction to most physical 
systems, the future of biological systems is influenced by the past in a 
manner which leads one to believe that the past seems to reach out 
and grasp the future. Therefore in the study of biological phenomena 
as stochastic processes it is of interest to take these “hereditary” 
effects into consideration. This can be done by formulating the process 
asa multiple Markov chain. Multiple chains have transition probabilities 
depending on previous values of the random variable at more than 
one point, e.g., for a chain of order h, 


The order of the process refers to the number of previous values of the 
random variable which are used in defining the transition probabilities. 
Available work on multiple chains is mainly concerned with the re- 
duction to simple chains. 

2. Branching processes. These processes represent the evolution or 
development of systems whose components can reproduce, be trans- 
formed, and die; the transitions being governed by probability laws 
(14). We can characterise these processes in the following manner: 
Consider a sequence of random variables z, , 2,2 , --: . Let a single 
organism at time t = 0 have probabilities p, , p, , p2, +--+ , DP,» of having 
n-offsprings. The variable z, is interpreted as the number of organisms 
in the n-th generation. Let p;; = Prob {z, = j|2,-: = 7} be the con- 
ditional probability that there are j organisms in the n-th generation 
given that there were 7 organisms in the (n — 1)-st generation. We 
define a generating function for the probabilities p, as follows 


then assuming that the organisms of a generation reproduce indepen- 
dently we have p,; is equal to the coefficient of s‘ in [F(s)]’, and the 
process is a Markov chain determined by the matrix P = (p;;). We 
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have also p{"’ is equal to the coefficient of s* in [F,(s)]’, where F,(s) = 
F[F,-:(s)]; that is F,(s) is the n-th iterate of F(s). 

Since the publication of W. Feller’s paper (10) on a stochastic 
model of population growth, the theory of branching processes has 
found many useful applications in biology. A discussion of this work 
is not included in this article since the recent papers of M. S. Bartlett 
(4), D. G. Kendall (18), W. Feller (12), and P. Armitage (2) give a 
review of earlier work, as well as new results in the study of genetics, 
population growth, and bacterial mutations. A recent application of a 
branching model is given in a paper by 8. Iverson and N. Arley (15) 
on the mechanism of experimental carcinogenesis. In this theoretical 
and experimental study a branching model is postulated for the growth 
process in tumours, and the parameters occurring in the model are 
estimated from the experimental observations. 

R. Bellman and T. E. Harris (7) have recently discussed a non- 
Markovian generalisation of the process presented above. In this 
generalised process, termed “age-dependent”, the generation time 7 
from the birth of an organism until its own subdivision has the general 
distribution dG(r), 0 < + < o. The fundamental equation in the 
Bellman-Harris theory is the functional equation 


F(z, = “HFG, t — + 2[1 G0). 


In the above equation F(z,t) is the generating function for the number 
of organisms in the population at time ¢, and h(z) = °%.o q,2", where 
q,. is defined as the probability that an organism will give rise to n- 
offsprings when division takes place. In their paper only the case of 
binary fission was considered, i.e., h(z) = 2°. We have recently treated 
the case in which h(z) = ¢ + (1 — o)z’, i.e., the process considered is 
a modified birth-and-death process (33). 

Before the appearance of the Bellman-Harris paper D. G. Kendall 
(17) had introduced the idea of a variable generation time in the study 
of stochastic birth processes. The case treated by Kendall is obtained 
by letting G(t) be the k-fold convolution of the function 1 — e™. 
This is equivalent to requiring the organism to pass through k stages 
before division can take place. The data of C. D. Kelly and O. Rahn 
(16) on the growth rate of Bacterium aerogenes was used to evaluate 
the parameter k, and to show the applicability of the model to biological 
problems. 

The Bellman-Harris theory should have important biological 
applications (19), especially when other forms for G(¢) are considered, 
and the transformation probabilities g, are made functions of time, 
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and the population size. It is also of interest to design experiments 
which will enable us to determine the value of the various parameters 
occurring in age-dependent process. 

3. Statistical inference in Markov chains. In this section we wish to 
call attention to several studies concerned with statistical inference in 
Markov chains which should be of interest to biologists who want to 
test the adequacy of a postulated Markov chain model, or to see if 
their observations form a Markov chain. 

T. W. Anderson (1) has considered a Markov chain model, with 
stationary transition probabilities, for analysing time changes in atti- 
tudes. The unspecified parameters in the model were the transition 
probabilities p;; . Utilising experimental data the p;,;’s were estimated, 
and their statistical properties studied. Methods are also given for 
testing the hypothesis that: (i) the p;; = p* (a given number) for a 
specific 7 and j, (ii) the p,;; are not time dependent, and (iii) the process 
is of order one against the alternative that it is of order two. The 
results obtained are also generalised for multiple chains of order h. 

Recently M. S. Bartlett (5), (6) has investigated the problem of 
testing the goodness of fit of a theoretical model to a Markov chain of 
order h. A method is also given for finding the maximum likelihood 
estimates for the transition probabilities of a chain of order h. 

In two papers M. Ogawara (23), (24) has studied stationary pro- 
cesses of order h; and has given conditions, in the form of stochastic 
difference equations, that these processes and their autocorrelation 
coefficients must satisfy if they are to be of order h. 


III. APPLICATIONS 


1. Central nervous system. In 1948 A. Shimbel and A. Rapoport 
published a paper in which a probabilistic approach to the study of 
the structure and function of the central nervous system was developed 
(37). In this, and subsequent papers, a stochastic, rather than a 
deterministic, theory of neural nets has been developed and applied 
to numerous problems in neurobiophysics. When one considers that 
the human brain contains about 10" neurons, which are the basic 
units of which it is built, it is perhaps clear why a stochastic approach is 
needed, and why it has proven to be a fruitful cne. 

In this approach the parameters which ai? associated with the 
neurons (threshold, refractory period, etc.) are assumed to vary from 
neuron to neuron according to some distribution function. The patterns 
resulting from the various neurons interacting with one another are 
characterised by certain functions of point paths in space. These 
functions denote the probability that a neuron in a macroscopically 
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small region about one of the points receives an axone from a neuron 
in a microscopically small region surrounding the second point. Hence 
the system is viewed as an aggregate of neurons characterised by 
continuous distributions of ‘local properties’ together with certain 
tendencies of connection from region to region. 

The probabilistic approach to the study of the central nervous 
system has given rise to the very useful concept of a random net (39). 
A random net can be defined as follows: Consider a collection of points, 
each of which issues some number of outwardly directed lines (termed 
axones). Each axone terminates upon some point of the collection, 
and the probability that an axone from one point terminates on another 
point is the same for every pair of points in the collection. The re- 
sulting configuration is a random net. A random net has two important 
properties which we shall now define. The weak connectivity of a random 
net is the expected number of points in the collection to which there 
exists a path from an arbitrary point, if the points are not in any way 
distinguished from each other. The strong connectivity is defined as 
the probability that from an arbitrary point in the net there exist 
paths to every other point. R. Solomonoff and A. Rapoport, using an 
approximation method, have computed the weak connectivity when 
the number of points in the net, N, is large, and when a, the number of 
axones issued by any point, is greater than or equal to one. When N 
and a are both small Markov chain methods are applicable to the study 
of randon net structure. Recently R. Solomonoff (38) has obtained 
exact expressions for the weak and strong connectivity utilising a 
multiple Markov chain of order 2. 

Recent studies have been concerned with such topics as cycles and 
steady state phenomena in random nets, the spread of excitation in 
a random net, and nets with a distance bias (28), (29). In addition, 
A. Shimbel (36) has developed a learning theory, based on the lowering 
of thresholds of neurons, and applied it to several random net models. 

2. Radiobiology. In the study of radiobiological phenomena several 
Markov chain models have been postulated. In 1945 I. Opatowski 
(25) developed a general theory of chain processes and applied his 
results to the study of injury and recovery in biological systems following 
irradiation. In Opatowski’s theory the occurrence of an effective 
event in the sensitive volume of an organism is considered as a transition 
of that volume to a new state. The observed effect is reached after a 
certain number of transitions between states have taken place, so as 
to reach some final state which represents the observed effect. 

The problem was formulated as follows: Consider a system which 
can take n + 1 states (0, 1,2, ---,m). Let the probability of transitions 
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(¢ — 1 — i) and (i + 1 — 27) during any time dt be, respectively, 
k,dt + o(dt) and g,;dt + o(dt). Let these be the only transitions possible 
during dt, and let the system be in the state 0 at ¢ equal to zero. If 
Y,(é) is defined as the probability that the system be in the state 7 at 
time t, then Y,(0) = 1, Y,(0) = 0 for z > 1, and 


= = kin + gi-1Y; (2.1) 


where Y,,.,; = Y; = g = Oif 7 <0. In the biophysical applications of 
(2.1) only the function Y,(¢) and k; = k or 0, g; = g or 0, where k 
and g are positive constants, was considered. The k,’s depend on the 
intensity of the radiation and on the sensibility of the organism in the 
state (¢ — 1) to a given radiation. Consequently, the k,’s depend on 
the probability of collision between a photon and the sensitive volume. 
The g,’s represent the intensity of recovery of the organism from the 
state (i + 1) to the statez. By the method of the Laplace transformation 
an explicit expression for Y,(¢) under the above assumptions was 
obtained. 

The theoretical results obtained by Opatowski have been applied 
to the experimental work of A. Zuppinger (40) on the irradiation of 
Ascaris eggs with X-rays (26). 

Recently A. T. Reid and H. G. Landau (34) postulated a model for 
the transmission of radiation damage following the initial damage due 
to the absorption of radiation quanta. The mechanism by which this 
transmission takes place was assumed to be the depolymerisation of 
macromolecules associated with the sensitive volume of the organism. 
The mathematical problem was formulated in terms of a random-walk 
problem with two absorbing barriers. A Markov chain with a finite 
number of states was considered: 2, --- E,_; E,. The 
system was considered in the state FE, initially, and following a “hit” 
in the sensitive volume passes to the state HE, . The transitions, 
E, — E, — --- — E, , represent the transmission or amplification of 
the initial damage, and 7, represents the state when the radiation 
damage becomes observable. 

The transition probabilities are defined as follows: 


= j/a, Gj 1), (2.2) 
Di,i-: = 1 — j/a, = 1, 1), (2.3) 
1j=0,a 
Pu * (2.4) 
0, otherwise. 
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The forward transition probabilities (2.2) represent transmission, and 
the reverse transition probabilities (2.3) represent recovery. The 
condition (2.4) means that no change will occur after either the state 
E, or E, has been reached, or in the terminology of random-walk 
there are absorbing barriers at 0 and a. It also means that the system 
will not remain in any of the intermediate or interior states indefinitely, 
but will pass to either FE, or E, . 

It was of interest to determine the probability, \, (probability of 
observable damage), that the system initially in EF, will eventually 
pass to [, ; also the probability, \) (probability of recovery), that the 
system will return to Z. These probabilities are given by 


(2.5) 
where p;°?_, = 0, and 

be = Po = 1 (2.6) 
where = 1. 


In order for the above results to be useful in experimental radio- 
biology we will now give several relationships that result from the appli- 
cation of the above model to many organisms. Suppose we plan to 
check the above model by irradiating N organisms, and wish to obtain 
an expression for the number of survivors under the assumption that 
the above model is applicable to each organism. If the organisms do 
not interact, the probability that there will eventually be no survivors 
out of N organisms will be given by the binomial distribution 


N 


(2.7) 


bine = ( 
If we wish to obtain an expression for the number of survivors after a 
finite period of time (finite number of steps), we must rewrite (2.7) as 


N 


(2.8) 


bing N, = ( 


that is, we have replaced \, and \, by the n-step transition probabilities 
pio and p;.’ respectively. Using well-known relationships we can now 
find the probability that there will be at most n, survivors B(no , N, pio’), 


in terms of the incomplete beta function 
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v(a; N, p'?) 
(2.9) 


Equations (2.7), (2.8), and (2.9) are all functions of the number of 
states a. The number of states has been expressed in terms of the 
number of nucleotide units in the macromolecule assumed to undergo 
depolymerisation following a hit, viz.,a@ = 1 is tantamount to a macro- 
molecule of 100 nucleotide units, etc. (32). 

In an attempt to attach some biological significance to the number 
of states in the above chain model we have encountered the following — 
problem which may be of interest to statisticians. The problem is | 
concerned with estimating the number of states from experimental 
observations. Suppose we consider the following experimental situation. 
Let us irradiate, one at a time, a large number of organisms. Now 
according to the above model an irradiated organism will either die 
or recover, hence there are two observable states. Let ff, #,--- ,& 
be the observed times required for m organisms to die, and let 
ti, i, ---, & be the observed times required for n organisms to recover. 
That these states have be -n reached can be determined by physiological 
and biochemical tests. Now, given the distribution of the ¢*’s and 
the ¢°’s, together with the transition probabilities between states, 
what is the maximum likelihood estimate of a? In order to determine 
an estimate of a it is necessary to make some assumption concerning 
the time required for each transition between states. 

3. Social behaviour. In the development of a mathematical biology 
of social behaviour stochastic methods have played an outstanding 
role. One of the problems receiving attention is that of dominance 
relations in social groups. In particular peck-order or “peck-right”’ 
phenomena in animal societies has been investigated; peck-order being 
defined as a binary asymmetric relation between each pair in a finite 
collection of individuals. 

A. Rapoport has developed a theory of peck-order utilising a simple 
Markov chain model (27). This model is developed as follows: Let S 
represent the structure of a group of individuals, and denote by p,, the 
probability of a change from structure S; to S;. Obviously p;,; repre- 
sents the probability of preservation of the structure S;. It is assumed 
that encounters (or transitions) occur at regular intervals, and this 
interval is taken as the unit of time. Denote by S,(é) the probability of 
occurrence of the structure S; at time ¢. Now S; at time ¢ may have 


; N, pio’) 


| 
| 
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occurred in either of the two ways given by the relation 
The probability of its having come from S; through no change is 
p:iS,(t — 1); and the probability of its having arisen from S; at t — 1 


is p;,;S,;(t — 1). Therefore the occurrence of S; at ¢ from any of the 
possible structures or through no change is 


S,(t) Pei Silt = 1), for all 7. (3.1) 
k=1 


If is the vector {S,(, S.(O, --- , S,(} equation (3.1) can be 
written in vector-matrix form 


= S(t — 1). (3.2) 
By iteration we obtain 
(3.3) 
Therefore 
S(o) = lim (p,,)'S(0). (3.4) 


Rapoport has been able to show that under certain conditions the limit 
in equation (3.4) is independent of S(O). This means that under those 
conditions the structure of the society will tend to a form which is 
independent of the initial structure. This result, which is a familiar 
one in the theory of Markov chains, tells us that the chain is ergodic. 

Recently. H. G. Landau (21) has applied the theory of Markov 
chains to the study of dominance relations and the structure of animal 
societies when social factors, i.e., those due to the existing or previous 
social structure, or dominance relations, or to the outcome of previous 
encounters, are assumed to have some effect.In this study the states 
E, are the possible structures of the society, with transition probabilities, 
Pixs , giving the probability of change from I, to E; as the result of an 
encounter. Before going on, we define several terms which will be 
used later: (1) The dominance structure is the matrix A = (a,;), with 
elements a;; = 1 if ¢ > j (read 7 dominates j), a;; = —1 if j > 7, and 
a;; = Ofori = 1, 2, --- , n; (2) The score structure, V, is the set of n 
integers V = (v, , v2, «++ , ¥,), Where v; means that the 7-th member 
dominates v,; of the others; (3) The hierarchy is the structure V = 
(n — 1,n — 2, «++ , 0); (4) The hierarchy index, h, is defined as 


n ao 2 
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hence h lies between 0 and 1, having the value 1 when V is the hierarchy, 
and 0 when V is the equality, i.e., whenv, = v. = --- =v, = (n — 1)/2. 

Before introducing the effect of social factors, Landau was able to 
show that under certain conditions the Markov chain whose states are 
the different structures is ergodic. This is the same result obtained 
earlier by A. Rapoport. A sufficient condition for ergodicity is that 
all €;;q;; must be greater than zero, where e;; is defined as the probability 
of an encounter between members 7 and j, and q;; is the probability 
that ¢ > j after an encounter. This condition implies that in any 
structure the probability that any dominance relation be either re- 
versed or maintained is not zero, hence p;; > 0, and the chain is said 
to be aperiodic. In addition, since any reversal of dominance relations 
has non-zero probability, it is possible to go from any structure to 
any other by a finite number of one-step transitions. This means that 
the chain is indecomposable. That the chain is ergodic follows by 
using the well-known ergodic theorem for indecomposable Markov 
chains with aperiodic states. 

The social factors considered and the results obtained were: (1) A 
uniform bias against reversal of dominance will have no effect on the 
stationary distribution of the structure of the society; (2) If the proba- 
bility of dominance is a linear function of the previously established 
score, there will be a small tendency for the society to move toward 
the hierarchy. If a member never challenges another whose score 
exceeds his own by two or more, or if he can never dominate it he 
should challenge, then the hierarchy is the only stable structure, i.e., 
it is an absorbing state. 

Before closing this section we remark that a generalisation of the 
above results is possible through the use of models formulated as 
multiple Markov chains. Here one is able to take into consideration 
various hereditary effects which may be important in the determination 
of social structures. 

4. The spread of epidemics and rumours. Early work in the mathe- 
matical theory of epidemics was mainly concerned with the develop- 
ment of deterministic models of the spread of disease through a popula- 
tion. In the deterministic treatment of epidemics the assumption is 
made that for given numbers of susceptible and infectious individuals 
and given infection and removal rates, a definite number of new cases 
would arise in a given period of time. It is known however that many 
chance factors enter into the determination of the number of infected 
individuals at any time, thus pointing up the need for stochastic models 
to supplement or replace deterministic models. 

In a recent paper N. T. J. Bailey (3) has considered the use of sto- 
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chastic models in epidemic theory, and given the stochastic treatment 
of a simple epidemic. The epidemic process is characterised by the 
stochastic differential-difference equations 


=(r+1)\(n—- r)p,+1(0) —rn—rt+ 1)p,(2), 
(r = 0,--- ,n—1,) (4.1) 
and 
= —np,(t), 


where p,(t) is the probability that in a population of n individuals there 
are r susceptibles still uninfected at time ¢. The above equation is 
substantially the same as W. Feller’s birth-and-death equation occurring 
in the theory of population growth. 

The concept of a random net can also be applied to the study of 
epidemics if one assumes the probability of transmission of the disease 
is the same for each pair of individuals in the population. The weak 
connectivity of the net now represents the expected number of indivi- 
duals which will contract the disease eventually, and the strong con- 
nectivity represents the probability that the entire population will 
succumb. 

The branching processes discussed in section II, and the random 
net model can also be used in the study of the spread of rumours. 
Recently A. Rapoport and L. I. Rebhun (30) have applied the theory 
of random nets to the theory of rumour spread. Here the weak con- 
nectivity of the net appears as the saturation fraction of the number of 
“knowers” (individuals that have heard the rumour) in a thoroughly 
mixed population through which the message is diffused, and where 
each knower tells the message to a finite number of individuals. A 
method is given for translating the time course equation of rumour 
spread (where the time is measured in the number of removes from the 
starters) into an ordinary continuous time equation if the distribution 
of the telling intervals is known. This study utilised experimental 
data obtained by 8. C. Dodd and his staff (8) at the Washington Public 
Opinion Laboratory. 

We are now considering the use of age-dependent stochastic branch- 
ing models in the study of epidemics and rumour spread. 

5. Communication nets. A communication net can be defined as a 
collection of nodes with channels connecting pairs of nodes in the col- 
lection. The use of this definition enables us to consider a communica- 
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tion net as a random net in which the points and axones have been 
replaced by nodes and channels. 

The structural properties of communication nets have been in- 
vestigated by using a matrix representation (Ordinary and Boolean 
matrices have been used) of its structure. The net is represented by 
a square matrix A = (a;;), the elements a;; having the value unity 
or zero depending on whether or not a channel connects the i-th to 
the j-th node. The order of A is determined by the number of nodes 
in the net. Stochastic matrices can also be used to represent net struc- 
ture; here the matrix elements p;; represent the probability that a 
channel is between the 7-th and j-th nodes, or the probability that the 
i-th node sends its information to the j-th node. 

At the present time we are considering the application of Markov 
chain techniques to the study of changes in the structure of communi- 
cation nets. In this approach we must consider a Markov chain whose 
states are the different possible net structures, hence we have a chain 
whose states are matrices. A. Shimbel has shown that for a net with 
n nodes there are less than N = 2"~” possible structures (or states). 
In addition to these structures we must add another structure whose 
matrix has elements greater than or equal to unity. This matrix will 
form an absorbing state since it represents the structure of the net 
when each node has all of the information originally possessed by the 
other nodes in the net. The task facing us now is the determination 
of the transition probabilities p,;; . The main difficulty is due to the 
lack of an adequate measure of matrix structure, and no clue as to 
what form the transition probabilities might have. One could get 
some ideas about the latter from experimental studies by utilising the 
methods of T. W. Anderson, which we referred to in section II. 

Recently social psychologists have been using the concept of a 
communication net in studying problem solving by groups (22). Two 
cases have been considered: (i) All members of the group can communi- 
cate freely with one another, and (ii) the communication patterns are 
restricted. An experiment designed by psychologists to utilise the 
results of the Anderson study would be extremely useful in enabling 
the mathematical biologist to develop a realistic model of changes in 
communication net structure. 
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A METHOD FOR COMPARING FLY-REPELLANT SPRAYS 


M. J. R. Hearty 
Rothamsted Experimental Station 


Introduction 


In a standard technique for comparing two fly-repellant sprays, 
batches of mice are treated with the two materials under test and are 
subsequently exposed under anaesthetic to attack by flies of the genus 
Stomorys. A control untreated batch is similarly exposed for purposes 
of comparison. If replicate tests are carried out on several occasions, 
it is found that the attack rates in the control and in the treated mice 
vary considerably, and the problem arises of finding some stable basis 
for comparison. 

In practice, five mice are used on each occasion for each treatment 
and for the controls, and each mouse is exposed to attack by 20 flies 
(this figure is not exact, but the variations in the number of flies are 
small enough to be neglected). The percentage attack rate is worked 
out for each mouse, and these rates are then averaged to give three 
mean attack rates for each occasion of testing. Full biological details 
of the method will be published elsewhere by Mr. L. C. Stones. 


The Mathematical Model 


Suppose on one occasion the observed mean attack rates on the 
controls and in the two treated batches of mice are po , p, , and p, 
(expressed as proportions). We may suppose that only a proportion 
Po of the flies were liable to attack in the treated batches, and hence 
that the proportions actually repelled were (po. — p,)/po and 
(Po — P2)/Po. Thus pi/pPo and p2/Po are estimates of the attack rates 
in the treated mice when the attack rate in the controls is 100%, and 
it seems reasonable to suppose that these rates are essentially constant 
and to base our comparisons upon them. If we write p, , ps for these 
“true” attack rates, the observed rates on occasion 1 are estimates of 


Poi ; Por X Pa Po. X Pr 
and from a set of such observations we wish to estimate p, and pz . 
It is convenient to transform the observations in order to render 
the effects additive. This implies taking logarithms. Writing x = log p, 
the expected values of the observations are given in terms of the param- 
eters by 
Tor 5 Zor + Za Zoi + 
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Estimation of the parameters by maximum likelihood assuming 
binomial distributions leads to rather intractable equations; moreover, 
the variability between replicate mice on one occasion is considerably 
greater than that expected in ordinary binomial sampling. However 
it appears empirically that an observed proportion p is subject to a 
variance roughly proportional to pg/n, where g = 1 — p. It follows 
that a value z has a variance proportional to g/pn, and on this basis 
we can estimate the parameters by weighted least squares. 


The Method of Estimation 


The weights to be used in the calculations should be based on 
expected rather than observed attack rates, and an iterative procedure 
will have to be used. Rather than use the observed values to obtain 
weights for the first cycle of calculations, it is better to start from 
rough preliminary estimates as otherwise convergence may be rather 
slow. To obtain these, set out the observed attack rates (as percentages) 
and their logarithms in a two-way table such as Table 1, which refers 


TABLE I. MEAN ATTACK RATES AND THEIR LOGARITHMS AND PRELIMINARY 
ESTIMATES OF THE CONSTANTS 


O A B 

1 Pp 60.2 4.0 8.8 
= 1.780 0.602 0.944 In = 1.778 

2 Pp 55.2 6.8 10.4 
x 1.742 0.833 1.017 te = 1.773 

3 Pp 41.4 2.8 4.0 
1.617 0.447 0.602 = 1.593 

1.713 0.627 0.854 
= —1.086 = —0.859 


to an actual experiment with three occasions. Working with the 
logarithms, calculate the means of the columns of this table, and by 
subtracting the control figure from the other two, obtain estimates of 
xz, and 2z,. The estimates of the control attack rates can be obtained 
in the same way from the means of the rows, but it is preferable to 
give relatively higher weights to the control figures since they will 
usually be more accurately determined. The exact weight chosen is 
not of great importance; in Table I a weight of 8 has been used which 
slightly simplifies the arithmetic. We thus take 8 times the control 
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figure, add the twe i> «‘mo.t figures, subtract the estimates of x, and 
and divide by i -for example 


Yo. = (8 X 1.786 + 0.602 + 0.944 + 1.086 + 0.859)/10 = 1.773. 


From these prelimin.ry estimates we can calculate the weights 
to be used in the first cycle of least squares adjustment. In Table II 


TABLE II. LEAST SQUARES ADJUSTMENT 


O A B 
1 p’ 59.3 4.9 8.2 
w 146 5 9 160 
wr 259.880 3.010 8.496 271.386 
4 p' 59.3 4.9 8.2 
w 146 5 9 160 
wr 254.332 4.165 9.153 267 .650 
3 p’ 39.2 3.2 5.4 
u 64 3 6 73 
we 103.488 1.341 3.612 108.441 
356 13 24 393 
617.7 8.516 21.261 647.477 


Zo = —0.03125z4 — 0.056257, + 1.69616 
Zoo = —0.0312524 — + 1.67281 
= —0.04110x, — 0.082197, + 1.48549 
12.5647, — 0.809zg = —12.782 
— 0.80914 + 22.4947, = —17.973 


A = 281.9601 
= 0.079777 C2 = 0.002869 = 0.044559 
= 1.07127 = —0.83753 


to. = 1.77675 Zoo = 1.75340 To; = 1.59836 


d.f. S.S. M.S. 
Constants 5 1097 . 8807 
Residual 4 0.4763 0.1191 
Total: 9 1098 .3570 


we set out the expected attack rates p’ calculated from the preliminary 
estimates, the weights w calculated as 100 p’/(1 — p’) and the products 
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of the weights by the observed 2’s, together with row and column totals. 

The next step is to set up the normal equations. There will be one 
equation for each constant—here five in all. The coefficient of x, in the 
first equation is the total weight of all observations involving 2z,; the 
coefficients of the other z’s are the weights of those observations in- 
volving these 2’s as well as x, , and the right hand side is }-wz taken 
over all observations involving x, . All these quantities are found in 
the second column of Table II. The other equations are set up in the 
same way, and the complete set is as follows— 


1324 + 522+ 3%3 = 8.516 

+ + 9X2 + = 21.261 
5t4 + 92g + 16025; = 271.386 
+ + 160202 = 267.650 
3x4 + 6x, + 7323 = 108.441 


The last three equations give 2; , X2 and 23 in terms of x, and zz 
and by substituting in the first two we get a pair of simultaneous 
equations to determine x, and x, . All the coefficients in the normal 
equations are to be found in Table II, and with a little practice it 
becomes unnecessary to write the equations out in full. 

If the equations for x4 and x, are 


ar, — brg =h 
—bi, + chp =k 
we calculate 
=c/A Cie = Coo = a/A 
Cyh + ek Lp = Cygh + cook 


and 2 , %o2 and 23 follow from the last three normal equations. If 
the values. obtained differ markedly from the preliminary values, 
a second cycle of calculations should be performed, basing the weights 
on the values of p’ calculated from the results of the first cycle. In 
our example, a second cycle is hardly necessary. 


8 
> 
II 


Precision of the Estimates 


It may be desired to attach standard errors to x, and x, although 
these will have to be interpreted with caution as the underlying distri- 
butions may be rather far from normal. We require for this purpose 
an analysis of variance. The sum of squares accounted for by the 
constants is found by multiplying each constant by the right hand 
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side of the corresponding equation in the last cycle of calculations and 
summing the products. The total sum of squares is found by multi- 
plying each observed x by the corresponding wr from Table II (or its 
equivalent in a later cycle if one is carried out) and summing. By 
subtraction we obtain a residual sum of squares (here with 4 d.f.) and 
hence the residual mean square, s*. The variances of the constants 
are then estimated by 


V(ta) = = Coos” V(ta — 22) = — + 


This estimate of error is based on very few degrees of freedom, 
and may be supplemented by an alternative estimate derived from 
differences within the groups of five mice. Comparison of the two 
estimates will also provide a test of the goodness of fit of the assumed 
model. The usual procedure would be to start by transforming the 
attack rates on individual mice into logarithms, but this is not possible 
here owing to the occurrence of zero values. Instead we can calculate 
for each group of five mice an estimated variance of the observed 
attack rates, >.(p — p)*, with 4 d.f. From the values of p’ used in 
the final cycle, we obtain a “theoretical”? value p’(1 — p’)/20. If we 
sum these quantities over all the groups and take the ratio of the sums, 
we obtain a “heterogeneity factor” a, such that the variance of an 
observed attack rate p is given by apg/n. To render this comparable 
with the estimate obtained from the analysis of variance we must 
multiply by (log,e)” to allow for the fact that common logarithms 
have been used. The “within groups” estimate of error variance is 
thus 0.1887 a, and this can be used in place of the “between groups” 
estimate unless the latter is substantially the greater of the two. In 
our example the sum of the empirical within-group variances was 
833.0 while the sum of the theoretical variances is 530.0, so that a = 1.57 
and s” = 0.296. Using this estimate of variance we can summarise the 
results in the form 


= —1.071 + .154 
Zp = —0.838 + .115 
La — Xe = —0.234 + .187 


In general, the control groups will receive a higher weight than the 
treated groups since they give a higher attack rate. It would thus be 
preferable to allocate more mice to each treated group than to the 
controls. This is easily allowed for in evaluating the weights. 

I am grateful to Mr. L. C. Stones of the Cooper Technical Bureau, 
Berkhamsted, for proposing this problem and for allowing me to use 
the data from his experiments. 


AN EXAMPLE OF THE USE OF FRACTIONAL REPLICATION’ 


O. KEMPTHORNE AND R. G. TISCHER 
Iowa State College 


I ntroduction 


The principles of fractional replication have been laid down by 
Finney (1,2) and Kempthorne (3,4). These principles are of considerable 
utility in exploratory research work, when the research worker has to 
consider a number of factors in combination, but is clearly unable to 
test the full factorial set. 
The purpose of this paper is to illustrate the use of these principles 
in what may be regarded as a complex situation. The problem to which 
this application has been made was the determination of various aspects 
of the dehydration of sweet corn. 


The Problem 


The production of dehydrated sweet corn ordinarily includes a 
consideration of: 


(1) Factors of growth 

(2) Varietal characteristics 

(3) Stage of harvesting 

(4) Method of preparation of raw products 
(5) Method of Blanching 

(6) Method of Dehydration 

(7) Conditions of storage 

(8) Evaluation of the final product 


1The paper reports research sponsored by the Quartermaster Food a1d Container Institute for the 
Armed Forces, and has been assigned no. 367 in the series of papers approved for publication. The 
views or conclusions contained in this report are those of the authors. They are not to be construed as 
necessarily reflecting the views or endorsement of the Department of Defense. Journal Paper No. 
J-2265 of the Iowa Experiment Station Projects 890 and 1123, Ames, Iowa. 
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These factors may be conveniently divided into three groups dealing 
with growth, manufacture and acceptability of the product. 

Since there are only about eight golden varieties of sweet corn which 
are normally grown in mid-western states, the experimental procedure 
was arranged to include all eight varieties. Because a comparison of 
varieties for dehydration could only be made taking account of the 
remaining important variables in the dehydration procedure, this work 
was set up to include consideration of varieties, field replications, 
blanching, dehydration and storage. 

Obviously this list does not include all of the variables which might 
be considered to have an effect on the final product, but it was thought 
that these variables represent the smallest number of the most important 
variables which should be considered in one experiment. 

Sweet corn is ordinarily harvested at a moisture content of 65 to 
75 percent for any processing operation and this includes the operation 
of dehydration. It has been found that sweet corn is most acceptable 
when harvested in this range of maturity. Following harvesting and 
preparation, the corn kernels must be blanched to inactivate the enzymes 
contained in the kernel which, should they remain active, would cause 
a loss in quality generally, and specifically the development of off- 
flavors in the corn. The operation of blanching includes the application 
of a temperature level for a given length of time sufficient to inactivate 
the enzymes which cause loss of quality. In the dehydration procedure, 
drying is carried out by one of several means in a current of relatively 
dry air until the corn has reached a moisture content, usually around 
five percent, which has been found desirable for maintaining the quality 
of the product during subsequent storage. Following dehydration, the 
product may be packaged in a variety of ways, again in an attempt to 
maintain quality. In addition to packaging, the temperature of storage 
and the duration of storage both exert effects upon th» ultimate quality 
of the product. In the light of these observations, it appears to be 
necessary to make any comparison of the quality of sweet corn obtained 
from one of another variety in terms of the processing operations, their 
intensity and extent. 

Since adequate objective tests are not available as criteria of quality 
of dehydrated corn, among other food products, it was necessary to 
use a taste panel to evaluate the final product. These evaluations 
were made in terms of appearance, odor, flavor, texture and color by 
three experienced judges. 

The particular aspect of the whole process which prompted the 
present investigation was the need for information to indicate the suit- 
ability of the varieties considered for dehydration. As has been pointed 
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out, no comparisons of these varieties could be made without considera- 
tion of the manner of preparation, and an apparently simple question 
has led us into the necessity of considering a rather complex one, analo- 
gous to Fisher’s discussion (5) of the desirability of factorial experi- 
mentation. 

From the point of view of experimental design the problem was 
resolved into finding a method for the inclusion of all the factors men- 
tioned in a design. It seems impractical to examine all these factors 
exhaustively in a single experiment, and an exploratory experiment was 
therefore considered which would indicate the degree of importance of 
the various factors. 


The Structure of the Experiment 


Given the above factors which are to be incorporated in the experi- 
ment, it is necessary to decide the ways in which these factors are to be 
allowed to vary, the number of levels of each and so on. The decisions 
on these points are of course partially based on subject matter knowledge, 
but must take into account the difficulties which may arise in the 
statistical design. For example, the authors know of no way of per- 
forming a 5 X 3 X 2 design under reasonable assumptions except in 
complete replicates. 

The 2” factorial system is particularly suitable for fractional repli- 
cation. In fact fractional replication is not useful with mixed factorial 
systems and the 3" system leads rapidly to an unduly large number of 
treatment combinations, since a 1/3 replicate is advisable only with 
5 or more factors. In view of these facts and the general exploratory 
nature of the study, it was therefore decided to work within the 2” 
system. The 2” system has a further advantage that it is possible to 
include factors at 4 or 8 levels (in fact any power of 2). 

As mentioned earlier, eight varieties of golden sweet corn were 
included. Four dates of harvesting based on the moisture content of 
the maturing kernels were used. In the blanching operation two 
combinations of time and temperature were considered while in the’ 
dehydration operation, temperature was controlled at two levels, the 
duration being what was necessary to reduce the moisture to a chosen 
level. This procedure was used in an attempt to discover something 
about the effect of these temperature combinations on quality of the 
final product. The temperatures of storage and lengths of storage were 
chosen to cover a reasonably wide range. 

Each of the factors with a number of levels equal to a power of 2 
may be represented by a suitable number of pseudofactors each with 2 
levels. The resulting factors may therefore be represented as follows: 


wal 
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varieties: a,b 
dates: d,e 
iield replicates: s 
time-temperature of blanching: 
temperature of dehydration: 
temperature of storage: 

length of storage: 


and formally we have a 2" experiment, that is, one with 11 factors each 
at 2 levels. The particular way in which the levels of the pseudofactors 
correspond to the levels of the factors is unimportant except for purposes 
of identification. 

The next step in the construction of a design is the choosing of a 
suitable fractional replicate and this is done by choosing an appropriate 
identity relationship. With 11 factors, some of which are pseudo- 
factorially related as in the above, it will be found that a 1/4 replicate 
of the full 2048 treatment combinations can be chosen in such a way 
that no interactions of two factors (not pseudofactors) are mutually 
confounded. 

The choice of identity relationship depends on the experimental 
limitations. With the particular resources available, 4 plots for each 
variety were possible, and as far as varieties are concerned the plan 
necessarily consisted of 4 randomized blocks of 8 plots. A completely 
randomized design as regards varieties could have been used, but was 
expected to be less efficient. These plots were to be divided into 4 
split-plots for the dates, and then each split-plot divided into 4 parts 
or split-split plots for the other treatments. With a fully replicated 
experiment each split-plot would have been divided into 16 parts for the 
other treatments, but this would have led to the impossibly large total 
number of treatment combinations. What was used amounts then to a 
combination of split-split-plot confounding and fractional replication. 
A design of essentially different structure would have been impossible. 

The identity relationship actually used is the following: 


I = ADRFG = BESHJ = ABDERSFGHJ. 


This identity relationship specifies the treatment combinations to eb 
tested in the following way: let x, , 22, «++ , 2;, correspond to the factors 
or pseudofactors a to j, in the order listed above, and let x, equal 0 
if factor a is at the lower level and xz, equal 1 if factor a is at the upper 
level and likewise for 22,23, °**,2;,;. Then the treatment combinations 
tested are those for which both 
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X, + + + + is even 
and Xo +25 + + Mio + X11 IS even 


The consequences of using only the selected subset of the treatment 
combinations may be written down easily from the identity relationship. 
Thus multiplying the whole relationship by A and imposing the rule of 
replacing A* by unity, we get 


A = DRFG = ABESHJ = BDERSFGHJ. 


This says that with the selected subset of treatment combinations the 
main effect of factor a is completely confounded with the four factor 
interaction DRFG, the five factor interaction ABESHJ and the nine 
factor interaction BDERSFGH J. 


Similarly we have 


AD = RFG = ABDESHJ = BERSFGHJ. 


It may be verified that no two factor interactions are mutually con- 
founded; where for example AD, ABCDE are both in fact two-factor 
interactions, each being a single degree of freedom contrast in the total 
of 21 degrees of freedom for varieties by dates. 

The above may appear somewhat obscure to some readers, and it may 
help them to discuss briefly a fractional design which we would not have 
used. Consider the identity relationship 


I = ABDEF = CDEGH = ABCFGH. 


The consequences of this relationship may be traced out as before. 
Thus for example 


A = BDEF = ACDEGH = BCFGH 


and so on, so that main effects are confounded with four and five factor 
interactions. However, we also find 


ABCF = CDEA = ABDEFGH = GH 


so that ABCF, which is in fact a one degree of freedom contrast from 
the interaction of varieties and time-temperature of blanching, is 
completely confounded with the two-factor interaction GH. 

The enumeration of the 512 treatment combinations which are to be 
tested with the chosen identity relationship is most easily made with 
1.B.M. equipment. First it is necessary to construct the 512 pseudo- 
factorial combinations. In the above instance we may write out the 
16 combinations on a, d, r, f and g which are even with respect to 
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ADRFG, that is, the 16 combinations on a, d, r, f and g which have zero, 
2 or 4 letters in common with ADRFG, and the 16 combinations on 
b, e, s, h, j which are even with respect to BESHJ. From these we 
generate all the possible 256 combinations of the two sets of 16, and 
then we take these 256 combinations both with and without c. All 
these operations are done with I.B.M. equipment, presence being 
indicated by a “1” in the appropriate column and absence by a ‘0’. 
Having obtained the 512 pseudofactorial combinations we may then re- 
code to the original factors by sorting in master cards and gang punching. 

The randomization procedure was the standard one for split-split 
plot designs. 


Analysis of the Experiment 


As we have indicated in the introduction, the characteristics of 
interest were subjective observations on the samples of corn, obtained 
from the cans after the designated length and temperature of storage. 
These observations were made by three workers in Food Technology at 
Iowa State College. Information would be of the most value, of course, 
when obtained from a sample of some specified population of tasters but 
in the absence of such a defined population, it appeared best to use 
available staff with previous experience in the area of investigation. 
Inferences which are drawn from the experimental data must therefore 
be restricted to the judges actually used and analyses were based 
consequently on the total responses for the three judges. 

The structure of the analysis of variance is determined by two 
considerations: 


(1) the assumptions made in using the identity relationship (namely 
that interactions of three or more factors were negligible) and 
(2) the restrictions used in the randomization. 


The analysis of variance as in all cases of split-split-plot designs, con- 
sists of three portions: | 


(1) the analysis of whole plot totals 
(2) analyses within whole plots between split-plot totals and 
(3) analyses within split-plots between split-split plots. 


This analysis of variance is based on the assumption that all interac- 
tions involving three or more factors are zero. The only part of this 
partition which merits comment is the inclusion of FG and HJ in the 
split-plot portion and not in the split-split-plot portion. This comes 
about because from the identity relationship we obtain 
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TABLE 1. PARTITION IN ANALYSIS OF VARIANCE OF EXPERIMENT 


df 

Replicates (RS). . 3 
Whole-plot error ........ 21 
Dates X varieties ........ 21 
Varieties X F, X°G, XH, XJ. . 28 
Mates 7,00, X 12 
Split-split-plot error... .... 336 


FG = ADR = BESFGHJ = ABDERSHJ 


so that FG is completely confounded with ADR which is a component 
of the split-plot error, in fact, being a component of varieties X dates X 
replicates. The same reasoning holds for HJ. 


Results of the Experiment 


It is not our purpose to review here at all extensively the result of 
the experiment. We shall instead present the analysis of variance for 
one of the characteristics of interest and a few summary tables with 
estimated standard errors. Our sole purpose is to show that the pattern 
of experimental observations was effective for the purposes of a pre- 
liminary overall investigation. 

As mentioned previously the experimental units were measured 
subjectively, and a scale of 0 to 9 was used. The analysis of variance 
given in Table 2 for the appearance of the resulting corn is typical. 

From this analysis of variance, we see that the experiment demon- 
strated effects due to varieties, dates and their interactions and to the 
storage treatments, but that the blanching and dehydration treatments 
were almost ineffective in changing the characteristic. There is a little 
evidence for an interaction of the blanching and dehydration treatments. 

As an example of effects of treatments, the following was found for 
varieties and dates with regard to appearance: 


BIOMETRICS, SEPTEMBER 1953 


TABLE 2, ANALYSIS OF VARIANCE FOR APPEARANCE 


Source of variation 


Degrees of freedom 


Sum of Squares 


Mean Squares 


Rep 3 38.71 
Var 7 1,190.19 170.03** 
Error A 21 635.23 30.25 
Total 31 1,864.13 
Date 3 9,046.14 3,015.38** 
Date X var. 21 1,502.80 71.56** 
FG 1 116.28 116.28** 
HJ 1 578.00 578 .00** 
Error B 70 2,787.96 39.83 
Total 127 15,895.31 
F 1 36.12 36.12 
G 1 12.50 12.50 
H 1 5,189.26 5,189.26** 
J 1 2,601.01 2,601.01** 
FH 1 94 .94 
FJ 1 29.07 29.07 
GH 1 23.63 23.63 
GJ 1 1.76 1.76 
Var X F 7 51.32 7.33 
Var X G 7 146.63 20.95 
Var X H 7 418.62 59.80* 
Var X J 7 722.62 103 .23** 
Date X F 3 61.59 20.53 
Date X G 3 39.02 13.01 
Date X H 3 467 .82 155.94** 
Date X J 3 141.98 47 .33 
Error C 336 7,303.68 21.74 
Total 511 33 , 142.88 


A discussion of the utility of the experimental results is given by 
It is sufficient to note that the experiment served 
its purpose in picking out clearly factors which are important, and in 
establishing the existence of some interactions. 


Tischer et al (6). 
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TABLE 3. VARIETY AND DATE EFFECTS AND INTERACTIONS 


Date 
Variety Mean 
1 2 3 4 
1 2.56 3.59 4.45 5.00 3.90 
2 3.19 4.23 4.53 5.10 4.26 
3 3.51 5.04 §.17 4.92 4.66 
4 3.09 4.34 4.68 5.28 4.35 
5 2.78 3.92 5.08 5.01 4.20 
6 3.24 3.89 4.97 5.17 4.32 
7 3.63 3.50 4.05 3.95 3.78 
8 3.08 4.06 4.83 5.07 4.26 
Mean 3.14 4.07 4.72 4.94 4.22 
Standard errors: of variety means: 0.12 
of date means: 0.09 
of entries in table for interaction: 0.26 
Average Effects of Blanching, Dehydration and Storage 
Blanching | Dehydra- Storage 
tempera- | tion tem- Mean 
ture perature 3 months | 6 months 
Low 4.26 4.24 70°F 5.30 4.19 4.75 
High 4.17 4.19 100°F 3.88 3.49 3.69 
4.59 3.84 


The standard error of means for blanching temperature, dehydration tempera- 
ture, storage time and storage temperature is 0.05. 


REFERENCES 


(1) Finney, D. J. The fractional replication of factorial experiments. Ann. Eug., 
12: 291-301, 1945. 

(2) Finney, D. J. Recent developments in the design of experiments. III Fractional 
replication. J. Agr. Sci., 36: 184-191, 1946. 

(3) Kempthorne, O. A simple approach to confounding and fractional replication 
in factorial experiments. Biometrika, 34: 255-272, 1947. 

(4) Kempthorne, O. The design and analysis of experiments. Chapters 20 and 21. 
John Wiley and Sons, New York, 1952. 

(5) Fisher, R. A. The design of experiments. Oliver and Boyd, Edinburgh, fourth 
edition, 1947. 

(6) Tischer, R. G., E. W. Jerger, O. Kempthorne, A. F. Carlin, J. A. Zoellner. 
Influence of variety on quality of dehydrated sweet corn. Food Technology 
(In press). 


j 
— 


‘ A STATISTICAL DESIGN FOR THE EFFICIENT REMOVAL 
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SUMMARY 


The method of construction of a certain class of designs for use with 
quantitative factors by means of which a trend occurring during a com- 
parative experiment may be eliminated without loss of efficiency is 
indicated. The design and analysis is illustrated with an example 
from biological assay. 


1. INTRODUCTION 


In recent work (Box and Wilson, 1951; Box, 1952) it has been 
emphasised that, when dealing with quantitative factors, (i. e. factors 
like temperature, time, dose, which can be varied on a continuous 
scale) it is frequently advantageous to employ statistical designs which 
depart from the “factorial” principle. It has long been appreciated 
that it is the orthogonal property of designs which produces efficiency. 
The factorial method, in which are tested all (or a selected fraction) 
of possible combinations of a few levels of each of the factors, is often a 
convenient way of obtaining orthogonality but it is, of course, by no 
means the only way, nor is it, in all circumstances, the ‘best’ way. 
Designs which may be used to determine the effects of a number of 
factors but which are not necessarily factorial designs or parts of 
factorial designs may conveniently be called multi-factor designs. 

When we are dealing with quantitative factors if we do not limit 
ourselves to designs involving a few equally spaced levels of the factors, 
a great gain in flexibility results and a widening of the possibilities for 
developing designs for specialised needs. Furthermore, there is no 
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very great increase in labour in the analysis of the experiment, and 
urrangements involving fewer observations often result. For example, 
in the first of the papers referred to above ‘‘composite”’ multi-factor 
designs of second order, that is to say designs which allowed the determi- 
nation of all the first and second order effects (linear effects, linear X 
linear interaction and quadratic effects) were developed which were at 
least as efficient as the orthodox three-level factorials but required 
many fewer experiments. In the second paper first order multi-factor 
designs of maximum efficiency for determination of the linear effects 
of any number of factors were described. These latter designs allowed 
simultaneous elimination of trends in the observations (e.g. time trends) 
without loss of efficiency, whilst a procedure called ‘‘angular randomisa- 
tion’”’ rendered subsequent statistical tests exact whatever the distri- 
bution of the observations and whether the assumed mathematical 
model was exactly realised or not. 

Work on a related problem has been published recently by Cox 
(1951) whose designs retain the factorial principle, but in certain 
circumstances still allow the efficient elimination of time trends. The 
purpose of the present paper is to develop a further design of the multi- 
factor type which makes possible the efficient comparison of two re- 
sponse curves, when a time trend in the results is simultaneously 
occurring and to show its application in a problem of biological assay. 


2. EXAMPLE 


A design of this type was used in the biological assay of d-tubocur- 
arine chloride. It was required to compare the estimated positions, 
slopes and curvatures of the dosage-response curves for a test material 
and a standard material whilst simultaneously eliminating, without 
loss of efficiency, any time trend which occurred in the course of the 
assay. 


2.1 Description of the Assay 


For this assay the phrenic nerve of a rat is dissected out and con- 
nected to a pointer which records on a smoked drum. This nerve which 
is immersed in Ringer Solution is given an electric stimulus every 12 
seconds which produces a contraction. To obtain an accurate measure- 
ment the apparatus is allowed to operate under the same conditions 
for a period, the average recorded contraction being the value used. 
When a suitable dose of d-tubocurarine is added, a reduction of the 
contraction occurs. This reduction measured as a percentage is called 
the percent inhibition and is the response used to assay the drug. The 
apparatus is thoroughly washed out and a second dose is tested. In 
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this way series of graded doses are given for the test and standard 
materials and the corresponding responses are measured. From these 
observations dosage response lines may be constructed from which the 
relative potency of the test material in terms of the standard may be 
calculated. 


2.2 Occurrence of Time Trend 


Unfortunately, as the test proceeds it is observed that the response 
(i.e. the % inhibition) becomes steadily larger due to tiring of the 
phrenic nerve, incomplete washing, or some other external cause not 
fully understood. Partial elimination of this trend could be attained, 
of course, by the use of a randomised block or an incomplete block 
design, the blocks being successive periods of time. Unless very small 
blocks could be employed, however, the error variance would be inflated 
by trends within the blocks, and an alternative method seems desirable. 


2.3 Principle of the Present Design 


In the procedure we adopt, the block size is reduced to two, that is, 
the experiments ‘are performed in pairs. In the first pair a dose D* 
of the test preparation is assayed against the same dose D, of the 
standard preparation. In the second pair of experiments the test 
preparation is assayed against the standard preparation at a dose D, 
and so on until k pairs of doses have been given. The order in which 
the preparations are tested within each pair is random.** We assume 
that the time trend which occurs during the experiment can be repre- 
sented by a polynomial of degree p fitted to the block means. The 
problem is to choose the dose levels so that this trend can be eliminated 
without any loss of efficiency in the estimation of the effects. This is 
accomplished by so arranging the doses that the effects to be estimated 
are orthogonal to the estimated coefficients of the fitted polynomials. 
If desired, the trend fitted to the pair-means may then be used to 
eliminate trend effects within the pairs which might otherwise inflate 
the error. 

For the sake of clarity we shall show first how such the design may 
be used in practice, leaving the method of derivation to §4. 


*In practice of course the test preparation is suitably diluted so that equal quantities of test and 
standard preparations give approximately equal responses. It would, therefore, be more exact to say 
that the doses in each pair were proportional rather than equal. 

**Here we have assumed that the order is completely random. An alternative procedure which 
may be adopted when k is even and which hag certain advantages is to randomise subject to the con- 
dition that in (4)k of the pairs the test preparation is assayed first and in the remainder the standard 
preparation is assayed first. 
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2.4 Levels of Dose Used 


In the design used for this particular assay eight pairs of doses 
were used as shown in Table 1. It was expected from previous experi- 
ence that the response/log dose relationship would be linear for both 
test and standard, that the lines would be parallel, and the increase in 
response with time would be roughly linear and could be adequately 
represented by, at most, a quadratic polynomial. Provision was made 
in the design used to check these postulates, and less restrictive assump- 
tions than those implied above were actually adopted. In fact effects 
one degree higher than those expected were allowed for, and it was 
arranged that: 


i) Only four distinct dose levels were employed. 

ii) Quadratic response/log dose lines could be fitted if necessary. 

iii) A cubic polynomial could be fitted to the time trend. 

iv) Differences between test and standard preparations in the dose 
and time trend curves could be detected. (i.e. interaction effects 
between samples and the dose and time effects could be esti- 
mated). 

v) All the effects listed above were orthogonal one with another. 


It is shown in §4 that, assuming the doses to be given at equally 
spaced intervals of time, one suitable arrangement is obtained if the 
log dose level are arranged so that the deviations from the mean log 
dose levels are proportional to the numbers, 0.55500, — 1.50206, 1.17615, 
—0.22909, —0.22909, 1.17615, —1.50206, 0.55500, the doses being 
administered in the order indicated. This means that the actual log 
levels should be X, = m + 0.55500 o, X. = m — 1.70206 a, --- , 
Xs, = m + 0.55500 o where m and ¢ are arbitrary constants so chosen 
that the design covers the required region of dose levels. It will be 
noted that, as mentioned above, only four distinct dose levels are used. 

Although, of course, the dose of drug cannot be administered in 
practice to the five-decimal accuracy given above, the error in ad- 
ministering the doses should be no more than that involved in attempting 
to administer the integer levels usually required by the orthodox type 
of design and we shall retain the full five-decimal accuracy in our 
calculations to avoid introducing computational inaccuracies. 

In the particular assay described the dosage region of interest lay 
between 2007 and 2707. The corresponding region of log dosage is 
from 2.301 to 2.437. It is found by trial that by taking m = 2.370 
and ¢ = 0.050 a suitable coverage is obtained and we have for the four 
log dose levels: 2.398, 2.295, 2.429, 2.359 corresponding to actual dose 
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levels of 2507, 1977, 268y and 228y to the accuracy attainable. These 
doses of test and standard were, therefore, administered at equally 
spaced time intervals in the order shown in Table 1, the order within 
pairs of test and standard preparations at each dose level being decided 
by the toss of a coin. 


TABLE lt. EXPERIMENTAL PLAN SHOWING ORDER IN TIME OF DOSE LEVELS AND 
RESPONSES OBSERVED. 


Experiment No. Dose 
4 197 , 20.9 
0 rd 513 
12 268 
i 197 ae 
i 250 ; 


3. ANALYSIS OF THE RESULTS 


The constants required in the calculation are set out in Table 2 
whilst the analysis of variance for the data is shown in Table 3. The 
total crude sum of squares based on sixteen degrees of freedom is entered 
in the first row of Table 3 and this is then split into two parts each based 
on eight degrees of freedom and accounting for the variation ‘between 
pairs’ and the variation ‘within pairs’ respectively. The former is the 
crude sum of squares calculated from the pair-sums, denoted by y, , 
shown in column vi of Table 2, and the latter the crude sum of squares 
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calculated from the pair-differences (test minus standard) denoted by 
ya » Shown in column vii of Table 2. The correction for the mean is 
next calculated in the usual way for both sums and differences. In the 
case of the differences the correction for the mean is in fact the sum of 
squares due to the difference in average response between the test samples 
and the standard samples and is called “samples” in Table 3. The 
corresponding entry in the effect column is the difference in mean re- 
sponse for test and standard samples. Sums of squares calculated direct 
from the sums or differences must, of course, be divided by two before 
entering in Table 3. 

The two sets of seven degrees of freedom remaining after elimination 
of the means are now further analysed using the sets of constants given 
in Table 2. 

In columns i), ii), and iii) of Table 2 are shown the values (as they 
are given by Fisher and Yates, 1942) of the orthogonal polynomials 
for calculating the linear, quadratic, and cubic time effects. In our 
notation these are denoted by ¢, , ¢. , and ¢;.. Columns iv) and v) show 
the orthogonal polynomials d, and d, for calculating the linear and 
quadratic dose effects. The method of arriving at the levels d, and 
d, is explained in §4. They are such that besides having zero sum of 
products with each other they also have zero sum of products with 
t, , tg , ts , thus all the effects to be calculated are orthogonal and may 
be computed independently. 

For example, by taking }> d,y, , the sum of products between d, 
and y, and dividing by = d; = 8, the sum of squares for d, , we have 
an estimate of the sum (test + standard) of the slopes of the response/ 
log dose curves. By taking the sum of products ba d,ya of d, with the 
differences y, and dividing by >> d? = 8, we have an estimate of the 
differences (test — standard) of these slopes. If, as is expected, the 
slopes for the test and the standard preparation are the same, then 
half the former quantity will provide the estimate for the common 
slope. On the other hand, if a discrepancy in slopes is found then the 
individual slopes for test and standard may be found from the estimates 
of the sum and differences. The corresponding sums of squares for 
the analysis of variance are given by half the square of the sum of 
products of d, and y, (or y,) divided by the sum of squares of d,. The 
factor of a half appears because y, and y, are each calculated from two 
observations. In general, we may calculate all the effects and the 
appropriate sums of squares in this way from the two simple formulae 


i) Sum (or difference) of effects = >> xy/>> 2” 
ii) Sum of squares = zy)?/2>> 2” 


where z is t, , t , ts , d; or d, and y is y, or Ye. 


| | : = 
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TABLE 2. ORTHOGONAL POLYNOMIALS FOR TIME TREND AND DOSE, WITH SUMS, 
DIFFERENCES AND CORRECTED DIFFERENCES OF PAIRS OF OBSERVATIONS. 
@) (ii) (iii) (iv) (v) (vi) (vii) (viii) 
Time Trend Dose Response 
Us Ud Ue 
Uncor- Cor- 
Linear Quadratic | Cubic Linear Quadratic Sums rected rected 
Differ- Differ- 
ences ences 
t te ts dy da (Test + | (Test — | (Test — 
standard) | standard) | standard) 
-7 7 -7 0.55500 —0 .46956 107.1 5.7 4.65 
-5 1 5 —1.50206 0.65424 68.4 6.6 5.55 
-3 -3 7 1.17615 0.85467 | 123.0 3.3 4.25 
-1 —5 3 —0.22909 | —1.03933 | 100.8 3.4 4.45 
1 -5 -3 —0.22909 | —1.03933 | 101.9 6.7 5.65 
3 -3 -7 1.17615 0.85467 | 132.1 3.5 4.55 
5 —1.50206 0.65424 78.6 4.2 5.25 
7 y 7 0.55500 | —0.46956 | 120.9 5.7 6.75 
Divisor 
168 168 264 8 4.91837 
TABLE 3. ANALYSIS OF VARIANCE (SUMS) 
Source of Variation Effect Sum of D/F Mean 
Squares Squares 
Grand Total 45,136.26 16 
BETWEEN PAIRS (Effect 
Sums) 
t+s 
Total 45 ,033.70 8 
Correction for mean 43,347.24 1 
Linear 1.048 92.19 1 92.10°*** 
Time Quadratic —0.213 3.81 1 3.81 
Cubic —0.081 0.87 1 0.87 
Linear 19.917 1586.75 1 1586 .75*** 
Quadratic —0.718 1.27 1 1.27 
Residua 1] 1.57 2 0.79 
Error (1) 3.71 4 0.93 
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TABLE 3. ANALYSIS OF VARIANCE (DIFFERENCES) 


WITHIN PAIRS (uncorrected)| (Effect 
Differences) 
t—s 
Total 102.56 8 
Samples 4.875 95.06 1 95.06*** 
Linear —0.046 0.18 1 0.18 
Samples X Time 4 Quadratic 0.119 1.19 1 1.19 
Cubic 0.000 0.00 1 0.00 
Linear —0.541 1 
| 
—0.622 0.95 1 0.95 
Residual 4.01 2 2.01 
Error (2) 4.96 4 1.24 
WITHIN PAIRS (Effect 
(Corrected for trend) Differences) 
t-—s 
Total 104.16 8 
Samples 5.005 100.20 1 100.20*** 
Linear 0.022 0.04 1 0.04 
Samples X Time { Quadratic 0.100 0.85 1 0.85 
Cubic 0.046 0.28 1 0.28 
Linear —0.388 0.61 1 0.61 
Sampl Dose 
~0.441 0.48 1 0.48 
Residual 1.70 2 0.85 
Error (3) ; 2.46 4 0.62 


Proceeding in this way the sums and differences of effects due to 
linear, quadratic and cubic time trends, and linear and quadratic dose 
effects and their accompanying sums of squares are calculated and are 
entered in Table 3. Two residual degrees of freedom for the between 
pairs comparison and two residual degrees of freedom for the within 
pairs comparisons remain. On comparing the effect mean squares in the 
table with the appropriate residual mean square it is at once apparent 
that there is no reason to doubt the postulates, (in mind when the ex- 
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periment was designed) that a quadratic time trend and linear response/ 
log dose curves would prove adequate to represent the time and dose 
effects. The sum of squares due to cubic time effects and quadratic 
dose effects were, therefore, combined with the residuals to give error 
(1) and error (2) each an appropriate estimate of error for the effects 
in the corresponding part of the table and each having four degrees of 
freedom. Significance of the effects when compared with the appropri- 
ate error mean square is denoted by asterisks. (Three asterisks denote 
significance at the 0.1% level). 

From the between pairs analysis it is seen that there is a large time 
trend effect almost completely accounted for by the linear component. 
There is also a slight suggestion of a quadratic effect but this is not 
sufficiently large to be definitely established. 

From the within pairs analysis we see that there is a real difference 
in potency between the test and standard preparation, but there is no 
evidence of difference in slope or curvature for the time trend lines or 
response curves. 


3.1 Elimination of Trend Effect Within Pairs 


We have seen that a large linear time tread is occurring and that 
consequently some of the error within groups is due to the change in 
the level of response occurring in the interval between successive tests 
within pairs. The doses are spaced (see column (i) of Table 2) so that 
one unit of time elapses between the first test and the second, and the 
difference (second test minus first test) is therefore overestimated by 
an amount £, where 6 denotes the regression coefficient of the common 
linear time trend for test and standard preparations. It follows that 
the variance of the difference (test minus standard) is 2c” + 6” and there- 
fore that error (2) estimates o” + 38’. We may correct for the trend 
within-pairs therefore by adding or subtracting an amount b from the 
eight differences, where b = 0.524 is an estimate of 8 and is given by 
3 X (1.048) taken from Table 3. It is subtracted from those pairs where 
the test preparation was assayed after the standard preparation and 
added in the contrary case. The “corrected” difference computed in 
this way to two-decimal accuracy and shown in column (viii) of Table 2 
is denoted by y,. After correction the difference (second test minus first 
test) is overestimated by an amount 6 — b. It follows that the variance 
of the corrected difference (test minus standard) is 2c” + o°(b), where 
o’(b) is the variance of b. Consequently, error (3) estimates o” + 407(b). 
Comparing this with the previous expression we see that it will probably 
be advantageous to correct for the discrepancy due to trend within pairs 
if b > o(b). The analysis of the corrected differences given in Table 2, 
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is carried in a manner exactly similar to that already described for the 
uncorrected differences. 

Now o°(b) = o7/(2 >> = o°/336. Hence the mean square for 
error (3) calculated from corrected differences estimates (1 + 1/672)o’. 
The mean square for error (3) multiplied by 672/673 is thus an unbiased 
estimate of o”. It will be seen (as would be expected if the mathematical 
model assumed were adequate) that estimates of o” from error (1) and 
from error (3) are compatible and a final combined estimate s” based on 


eight degrees of freedom may be calculated from the combined sums of 
squares. 


{sum of squares for error (1)} 


+ {672/673 < sum of squares for error (3)} 


Dividing by eight we have the estimate of the combined error variance 
s’ = 0.70. 


3.2 Calculation of Relative Potency 


If 1 is the estimated difference (in units of the design) between dose 
levels of test and standard preparations giving the same response, e is 
the estimated common slope of the log dose/response curves and a is the 
average corrected difference in response then 


In log dose units, that is 0.503 X 0.050 = 0.025. Now antilog 
0.025 = 1.059 whence the estimated relative potency of test to standard 
preparation is 105.9%. 


3.3 Fiducial Limits for the Relative Potency 


Fiducial limits for the relative potency may now be found following 
Feiller (1940). We have a = 9, = 9; + 2b/8 (since the correction is 
subtracted in three cases and added in the remaining five) and g, and 
b are distributed independently, whence 


1 1 
= + EVO = (142. = 


also V(e) = 


| 
| 
a 5.085 : 
= 9.959 0-503 
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Since a and ¢ are distributed independently and (we shall assume) 
normally with variances V(a) and V(e) it follows that in repeated 
sampling 


z2=a-+nde 


is distributed normally about zero with variance 


V@) + VVC) = +h 1}, { 5376 
and that 
a+ re 
5376 


is distributed as Student’s ¢ with the same number of degrees of freedom 
as s (eight in this example). — 

Any suggested hypothetical value for \ would, therefore, be re- 
jected at the 100 a% level of significance unless 


4 1345 + 


where ¢, is the 100 a% level of significance of the ¢ distribution. The 
95% fiducial limits are provided, therefore, by those values of \ which 
just make the above inequality untrue i.e. which make it an equality. 
Substituting the numerical values for a, e, s, and ¢, and multiplying 
both sides of the equality by 5376 we have the quadratic equation in A 


531,895\” — 535,904 + 129,660 = 0 
which has the solutions 
= 0.6038 and = 0.4037 


Multiplying these solutions by 0.050 to reduce them to log dose units 
and taking antilogs we have the 95% fiducial units for potency of 
107.2% and 104.8%. 


4. CONSTRUCTION OF DESIGNS 


Suppose that we are to give k pairs of doses and that a polynomial 
in time (¢) of p-th degree will adequately represent the time-trend and 
a polynomial in dose level (d) of g-th degree will adequately represent 
the dosage-response curve in the absence of the time-trend; then pro- 
vided p + q is less than k we may take as the model. 
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+ ad’ a," (1) 
where 7 is the true response at dose d and time ¢. 

Now instead of considering powers of ¢ consider the orthogonal 
polynomials (see for example Fisher and Yates, 1942) which will be 
denoted by ,., ¢;,., ¢, corresponding to£é,,., , . , & in the Fisher 
and Yates notation. Equation (1) can then be written in the form 


= By + Bil + + Bits + Bt, + ad + 
+ a;d' (2) 


The orthogonal polynomials for equally spaced intervals* and k = 8 are 
given in Table 4. 


TABLE 4. ORTHOGONAL POLYNOMIALS FOR k = 8. 


ty ts ts te 
—7 7 —7 7 —7 1 
—5 1 5 —13 23 —5 7 
—3 —3 —3 —17 9 —21 
—5 3 9 ~—15 —5 35 
1 —5 —3 9 15 —5 —35 
3 —3 —3 17 9 21 
5 1 —5 —13 —23 
4 7 1 1 
z=? = 168 168 264 616 2184 264 343 


The problem now is to choose the levels of the dose d so that the 
estimates of the a’s are always uncorrelated with the estimates of the 
6’s. This problem is solved if 


> td’ =0 for all values of i and j (3) 


where the sign }> is used to denote summation over the k values of 
the polynomial. 


*Fisher and Yates tables give these polynomials up to fifth order only, but supply a recurrence 
formula from which the high polynomials are readily obtained. Alternatively an extended table giving 
the polynomials of all orders to k = 26 has recently been published by De Lury (1951). A design 
could, of course, be developed for values of t which are not equally spaced, but in this case, the values 
of the orthogonal polynomials would have to be calculated. This could be done either in the straight- 
forward manner as described for example, by Kendall (1946) or by the Choleski method of matrix 
inversion recently described by Rushton (1951). 


i 
a 
oe 
i 
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Since the polynomials up to ¢, are used to represent the time trend 
and the a’s are to be uncorrelated with the 6’s, d must. be a linear func- 
tion of the extra polynomials in time of order higher than p, namely 

We have, therefore, 


where the constants 7,4: , ... , Ye-1 are chosen to satisfy the relations 
represented by equation (3). 


4.1 Derivation of the Design Used in the Example 


To illustrate the method the design used in the example above, will 
be derived. In this example experience suggested that a polynomial 
of second degree would adequately represent the time trend whilst the 
log-dose/response lines would be linear. In the design it was decided 
to allow for the estimation of effects of one order higher than those 
actually expected. Allowance was made, therefore, to fit a polynomial 
of up to third degree to represent the time trend and up to second degree 
to represent the log dose/response lines. 

A preliminary examination shows that to obtain a satisfactory 
solution and allow a reasonable number of degrees of freedom for the 
estimation of error at least 8 pairs of doses would be required. We 
have then in the notation above, 


p = 3, q=2, k=8 
Equation (4) yields 
d = + sts + + (5) 


and because of the orthogonal property of the polynomials it is true 
whatever the values of the y’s that 


dt, = = = 0. (6) 


In addition we wish to arrange that the quadratic dose effect is also 
orthogonal to the time effects, so that we have to satisfy the three 
equations 


dYdt=0, (7) 


Now 
= (rate + + vole + 
= + 5(155) + y6(166) + 3(177) + 2ys75(145) 
+ Qysye(146) + ete. (8) 
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where for example (144) means >> #,{; . Similar expressions may be 
obtained for >> d’t, , and > d’t, . 

In practice most of the terms in these expanded expressions vanish 
since, for equally spaced values of t, the orthogonal polynomials of odd 
order are odd functions and those of even order are even functions. 
It follows that quantities like (157), (177), (144) involving three odd 
order polynomials or one odd and two even order polynomials are 
zero. Making use of the fact, the three equations derived from (7) are 


vsvs(145) + + + yev:(167) = 0 (9) 

ya(244) + 5(255) + 6(266) + 2(277) + 
+ 2ys77(257) = 0 (10) 
¥475(345) + + rsve(356) + = 0 (11) 


to which a wide variety of solutions may be found. In particular if d 
can be taken as a linear function of the even order polynomials only 
(i.e. if ys and 7 can be put equal to zero) we shall obtain a design in 
which only four distinct levels of dosage are used instead of eight. 
Since it was of considerable practical advantage to reduce the number 
of dose levels, it was decided to adopt a solution of: this kind. 

Putting ys and y, equal to zero all the terms in equations (9) and 
(11) become zero and (10) becomes 


yi(244) + 2yeve(246) + 76(266) = 0 (12) 


and the values of the constants are readily found from Table 4 of the 
orthogonal polynomials. In fact, 


(244) = 160, (246) = 840, (266) = —672 (13) 

whence writing y = y:/7y.s we have 
1607’ + 16807 — 672 = 0 (14) 
y = 0.385,823 or vy = —10.885,823 (15) 


Hence, the levels for a design fulfilling all the requirements are pro- 
vided either by 


d = c(0.385,823t, + t) of by d = c’(—10.885,823t, + t) (16) 


where c and c’ are scale constants at our choice. In practice it is con- 
venient to choose the scale constant so that the ‘standard deviation’ 


| 
| 
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(> d?/k)} of the levels is unity. The dose levels in these units we denote 
by d, , so we have >> d? = k. This choice of units enables the experi- 
menter to assess readily the scaling-up factor which will be required so 
that the design covers the range of levels desired. 

We require then 


Lace’ =k (17) 


whence 


k 4 


Substituting numerical values in (18) we find c = 0.149,970,1 
c’ = 0.010,449,2 and the required levels are 


0.057, 862t, + 0.149, 970%, —0.113, 755t, + 0.010, 4504, 
(1) 0.555,00 (1) —0.785,84 
(2) —1.592,06 (2) 1.426,57 
(3) 1.176,15 (3) 0.435,31 
(4) —0.229,09 (4) —1.076,05 
(5) —0.229,09 or (5) —1.076,05 
(6) 1.176,15 : (6) 0.435,31 
(7) —1.502,06 (7) 1.426,57 
(8) 0.555,00 (8) —0.785,84 


In the numerical example discussed above ne first of the two alternative 
designs was used. 

In the calculation of the quadratic dose effect it is convenient to use 
the orthogonal quadratic polynomial which is calculated from the 
formula (see for example Kendall 1946) 


d, = di — (Qi di/k)d, — 1 (19) 
In the present example we find, 
d, = d; + 0.400,748d, — 1 (20) 


which yields the values for the quadratic polynomial given in column 
(v) of Table 2. 


5. DISCUSSION 


From the above example the general procedure will be apparent and 
designs may be developed for the solution of other similar problems. 
As might be expected if p + q is nearly as large as k no solution will be 
possible. On the other hand, if p + g is small compared with k then a 
great variety of solutions may be possible. 
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When as in the example above a linear time trend is adequate the 
device of angular randomisation described by Box (1952) may be used 
for random selection of the design. The statistical analysis will then 


be exact whether the mathematical model precisely fits the situation 
or not. 
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A DOSE-RESPONSE EQUATION FOR THE INVASION OF 
MICRO-ORGANISMS 


Peto 
Microbiological Research Department, Ministry of Supply, Porton, Wilts. 


The Problem 


A series of k doses (n; , m2, *** , % micro-organisms) is administered 
to m, , m,, -** , m, test animals respectively, and the corresponding 
survivors are observed to number , 72, °°: , 7. (The terms “surviv- 
ors” and “‘killed’’ whenever they occur subsequently, are meant to cover 
also the case when “not infected” are compared with “‘infected’’). This 
paper derives a dose-response relation from a hypothesis based on the 
mode of action of the micro-organisms against their host first suggested 
by H. A. Druett (2). 

Assuming that 


(i) the test animals are homogencous, 
- (ii) the probability of one organism killing its host is p (small), 
(iii) the organisms act independently of each other, 


then if n is the number of organisms administered to each subject, the 
expected proportion surviving is given by 


(1) 


The statistical problem is to estimate the single parameter p from a 
series of observations of corresponding values of the actual proportions 
surviving r/m and the numbers of attacking organisms n. Equation (1) 
may be written 


In S = —pn (1a); 
thus if we plot 


T2 Tk 
M2 


against n, , 2, °°: , m, and fit a straight line to the resulting points, the 
negative slope of this line will represent p, the probability of any one 
organism killing an animal. 

It follows that dose-response relationships for all kinds of organisms 
and test animals should be represented by a single line of constant slope 
when the doses are expressed as multiples of the EDs, . For taking c 
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organisms to be the ED;, , equation (la) becomes 


In 0.5 = —pe (1b) 
and putting n = fc we obtain 
InS = fn0.5 (2) 


The Maximum Likelihood Solution 
Let the probability that one particular animal survives be 
P=(1— pe" 
then the probability of r surviving out of m animals at risk will be 
Pr{rsurv. of m} = - Pp)" = 
and the logarithm of the likelihood is 
L=-p nr; + (m; — r;) In (1 — e&"*”) + const. 


Hence 


dp ner; + > (m; 1 
and 
> (m; ri)n; (1 
p (m; (nip) 1) (4) 


Substituting n,p = 2x, equation (3) takes the following form for Lia. : 


am +? Em, 0 (5) 
p 


and using the same substitution for equation (4) the variance of p is 
given by 


Var (p) = Yo P = (6) 


1)’ 
Equation (5) can be readily solved to any degree of accuracy with the 
help of the Table A and the application of Newton’s Method of approxi- 
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mation; Table B evaluates expression (6) in order to obtain fiducial 
limits to p. 


The goodness of fit can be tested with x’, and confidence limits can 


be assigned at any desired level of dose or response. 


A Numerical Example 


This department obtained the following data (first 4 columns of 


Table I) from one of a series of experiments in which B. anthracis spores 
were allowed to enter guinea pigs by the respiratory route (3). 


The necessary steps to complete Table I and to obtain the required 


results are as follows: 


(1) After having entered the values r/m in the appropriate column, 
In r/m is plotted against (preferably on log paper) and a line 
through the origin is fitted by eye; an estimate p,(0.019) of the 
slope =p is thus obtained (see fig. 1). 

(2) The np, values are computed and entered in the z-column, leav- 
ing room for corresponding values of a second cycle (np,). 

(3) Tables A and B give the following values: 


z a/(1 — e*) — 1)? 
0.32 1.169 0.992 
0.66 1.366 0.964 
1.23 1.738 0.883 
1.91 2.242 0.744 


The 4 values z/(1 — e*) are multiplied in turn by 8, 18, 21, 
28(m — r column) the sum of these products being 133.214; the 4 
values x’e*/(e* — 1)? are treated likewise yielding 64.66. Then 
the residue of equation (5) f(p,) = 80 and Var (p) = 0.00000558 
are obtained as set out in the lower part of Table I. 

(4) A second approximation p, to p is computed by applying New- 
ton’s Method: p. = p, + f(p,) Var (p) = 0.0194. 

(5) In the z-column the new pn values are entered in brackets 
(0.33 --- 1.95), and f(p.) = +83 is evaluated using Table A. 
Clearly no closer approximation is required and in fact Var (p) 
need not be recomputed. 

(6) The expected numbers of survivors are calculated, and used to 
evaluate x” with degrees of freedom one less than the number of 

doses, since one parameter has been estimated from the data. 
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TABLE A 
z/(1 — e*) 


z 0.00 | 0.01 0.02 | 0.03 | 0.04 | 0.05 | 0.06 | 0.07 | 0.08 | 0.09 

0 1.000 | 1.005 | 1.010 | 1.015 | 1.020 | 1.025 | 1.030 | 1.035 | 1.041 | 1.046 

1 1.051 | 1.056 | 1.061 | 1.066 | 1.072 | 1.077 | 1.082 | 1.087 | 1.093 | 1.098 

2 1.103 | 1.109 | 1.114 | 1.119 | 1.125 | 1.130 | 1.136 | 1.141 | 1.147 | 1.152 

3 1.157 | 1.163 | 1.169 | 1.174 | 1.1807] 1.185 | 1.191 | 1.196 | 1.202 | 1.208 

4 1.213 | 1.219 | 1.225 | 1.230 | 1.236 | 1.242 | 1.248 | 1.253 | 1.259 | 1.265 

5 1.271 | 1.277 | 1.282 | 1.288 | 1.294 | 1.300 | 1.306 | 1.312 | 1.318 | 1.324 

6 1.330 | 1.336 | 1.342 | 1.348 | 1.354 | 1.360 | 1.366 | 1.372 | 1.378 | 1.384 

7 1.391 | 1.397 | 1.403 | 1 409 | 1.415 | 1.421 | 1.428 | 1.434 | 1.440 | 1.446 

8 1.453 | 1.459 | 1.465 | 1.472 | 1.478 | 1.485 | 1.491 | 1.497 | 1.504 | 1.510 

9 1.517 | 1.523 | 1.530 | 1.536 | 1.543 | 1.549 | 1.556 | 1.562 | 1.569 | 1.575 
1.0 1.582 | 1.589 | 1.595 | 1:602 | 1.609 | 1.615 | 1.622 | 1.629 | 1.635 | 1.642 
Oe 1.649 | 1.656 | 1.662 | 1.669 | 1.676 | 1.683 | 1.690 | 1.697 | 1.703 | 1.710 
1.2 1.717 | 1.724 | 1.731 | 1.738 | 1.745 | 1.752 | 1.759 | 1.766 | 1.773 | 1.780 
1.3 1.787 | 1.794 | 1.801 | 1.808 | 1.815 | 1.822 | 1.830 | 1.837 | 1.844 | 1.851 
1.4 1.858 | 1.865 | 1.873 | 1.880 | 1.887 | 1.894 | 1.902 | 1.909 | 1.916 | 1.924 
1.5 1.931 | 1.938 | 1.946 | 1.953 | 1.960 | 1.968 | 1.975 | 1.982 | 1.990 | 1.997 
1.6 2.005 | 2.012 | 2.020 | 2.027 | 2.035 | 2.042 | 2.050 | 2.057 | 2.065 | 2.072 
1.7 2.080 | 2.088 | 2.095 | 2.103 | 2.110 | 2.118 | 2.126 | 2.133 | 2.141 | 2.149 
1.8 2.156 | 2.164 | 2.172 | 2.180 | 2.187 | 2.195 | 2.203 | 2.211 | 2.219 | 2.226 
1.9 2.234 | 2.242 | 2.250 | 2.258 | 2.266 | 2.273 | 2.281 | 2.289 | 2.297 | 2.305 
2.0 2 313 | 2.321 | 2.329 | 2.337 | 2.345 | 2.353 | 2.361 | 2.369 | 2.377 | 2.385 
2.1 2.393 | 2.401 | 2.409 | 2.417 | 2.425 | 2.433 | 2.442 | 2.450 | 2.458 | 2.466 
2.2 2.474 | 2.482 | 2.490 | 2.499 | 2.507 | 2.515 | 2.523 | 2.532 | 2.540 | 2.548 
2.3 2.556 | 2.565 | 2.573 | 2.581 | 2.589 | 2.598 | 2.606 | 2.614 | 2.623 | 2.631 
2.4 2.639 | 2.648 | 2.656 | 2.665 | 2.673 | 2.681 | 2.690 | 2.698 | 2.707 | 2.715 
2.5 2.724 | 2.732 | 2.741 | 2.749 | 2.757 | 2.766 | 2.774 | 2.783 | 2.792 | 2.800 
2.6 2.809 | 2.817 | 2.826 | 2.834 | 2.843 | 2.851 | 2.860 | 2 869 | 2.877 | 2.886 
2.7 2.895 | 2.903 | 2.912 | 2.920 | 2.929 | 2.938 | 2.946 | 2.955 | 2.964 | 2.973 
2.8 2.981 | 2.990 | 2.999 | 3.007 | 3.016 | 3.025 | 3.034 | 3.043 | 3.051 | 3.060 
2.9 3.069 | 3.078 | 3.086 | 3.095 | 3.104 | 3.113 | 3.122 | 3.131 | 3.139 | 3.148 
3.0 3.157 | 3.166 | 3.175 | 3.184 | 3.193 | 3.202 | 3.211 | 3.219 | 3.228 | 3.237 
3.1 3.246 | 3.255 | 3.264 | 3.273 | 3.282 | 3.291 | 3.300 | 3.309 | 3.318 | 3.327 
3.2 3.336 | 3.345 | 3.354 | 3.363 | 3.372 | 3.381 | 3.390 | 3.399 | 3.408 | 3.417 
3.3 3.426 | 3.435 | 3.445 | 3.454 | 3.463 | 3.472 | 3.481 | 3.490 | 3.499 | 3.508 
3.4 3.517 | 3.527 | 3.536 | 3.545 | 3.554 | 3.563 | 3.572 | 3.581 | 3.591 | 3.600 
3.5 3.609 | 3.618 | 3.627 | 3.637 | 3.646 | 3.655 | 3.664 | 3.673 | 3.683 | 3.692 
3.6 3.701 | 3.710 | 3.720 | 3.729 | 3.738 | 3.747 | 3.757 | 3.766 | 3.775 | 3.785 
3.7 3.794 | 3.803 | 3.812 | 3.822 | 3.831 | 3.840 | 3.850 | 3.859 | 3.868 | 3.878 
3.8 3.887 | 3.896 | 3.906 | 3.915 | 3.924 | 3.934 | 3.943 | 3.952 | 3.962 | 3.971 
3.9 3.981 | 3.990 | 3.999 | 4.009 | 4.018 | 4.028 | 4.037 | 4.046 | 4.056 | 4.065 
4.0 4.075 | 4.084 | 4.093 | 4.103 | 4.112 | 4.122 | 4.131 | 4.141 | 4.150 | 4.160 
4.1 4.169 | 4.179 | 4.188 | 4.198 | 4.207 | 4.216 | 4.226 | 4.235 | 4.245 | 4.254 
4.2 4.264 | 4.273 | 4.283 | 4.292 | 4.302 | 4.312 | 4.321 | 4.331 | 4.340 | 4.350 
4.3 4.359 | 4.369 | 4.378 | 4.388 | 4.397 | 4.407 | 4.416 | 4.426 | 4.436 | 4.445 
4.4 4.455 | 4.464 | 4.474 | 4.483 | 4.493 | 4.503 | 4.512 | 4.522 | 4.531 | 4.541 
4.5 4.551 | 4.560 | 4.570 | 4.579 | 4.589 | 4.599 | 4.608 | 4.618 | 4.627 | 4.637 
4.6 4.647 | 4.656 | 4.666 | 4.676 | 4.685 | 4.695 | 4.705 | 4.714 | 4.724 | 4.733 
2.7 4.743 | 4.753 | 4.762 | 4.772 | 4.782 | 4.791 | 4.801 | 4.811 | 4.820 | 4.830 
4.8 4.840 | 4.850 | 4.859 | 4.869 | 4.879 | 4.888 | 4.898 | 4.908 | 4.917 | 4.927 
4.9 4.937 | 4.946 | 4.956 | 4.966 | 4.976 | 4.985 | 4.995 | 5.005 | 5.014 | 5.024 


Abridged from ‘‘A Table of the Function (G(z) = z/(1 — é*) and its Applications to Problems 
in Compound Interest” by J. F. Steffensen, Skandinavisk Aktuarietidskrift, 1938; by kind permission 
of the author and the publishers. 


TABLE B 
— 1)? 

z 0.00 | 0.01 0.02 | 0.03 | 0.04 0.05 | 0.06 | 0.07 | 0.08 | 0.09 

0 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 0.999 | 0.999 

1 0.999 | 0.999 | 0.999 | 0.999 | 0.998 | 0.998 | 0.998 | 0.998 | 0.997 | 0.997 

2 0.997 | 0.996 | 0.996 | 0.996 | 0.995 | 0.995 | 0.994 | 0.994 | 0.993 | 0.993 

3 0.993 | 0.992 | 0.992 | 0.991 | 0.990 | 0.990 | 0.989 | 0.989 | 0.988 | 0,987 

4 0.987 | 0.986 | 0.985 | 0.985 | 0.984 | 0.983 | 0.983 | 0.982 | 0.981 | 0.980 

5 0.979 | 0.979 | 0.978 | 0.977 | 0.976 | 0.975 | 0.974 | 0.973 | 0.972 | 0.971 

6 0.971 | 0.970 | 0 969 | 0.968 | 0.967 | 0.966 | 0.964 | 0.963 | 0.962 | 0.961 

7 0.960 | 0.959 | 0.958 | 0.957 | 0.956 | 0.954 | 0.953 | 0.952 | 0.951 | 0.950 

8 0.948 | 0.947 | 0.946 | 0.945 | 0.943 | 0.942 | 0.941 | 0.939 | 0.938 | 0.937 

9 0.935 | 0.934 | 0.932 | 0.931 | 0.930 | 0.928 | 0.927 | 0.925 | 0.924 | 0.922 
1.0 0.921 | 0.919 | 0.918 | 0.916 | 0.915 | 0.913 | 0.911 | 0.910 | 0.908 | 0.907 
| 0.905 | 0.903 | 0.902 | 0.900 | 0.898 | 0.897 | 0.895 | 0.893 | 0.892 | 0.890 
1.2 0.888 | 0.886 | 0.885 | 0.883 | 0.881 | 0.879 | 0.878 | 0.876 | 0.874 | 0.872 
1.3 0.870 | 0.868 | 0.867 | 0.865 | 0.863 | 0.861 | 0.859 | 0.857 | 0.855 | 0.853 
1.4 0.852 | 0.850 | 0.848 | 0.846 | 0.844 | 0.842 | 0.840 | 0.838 | 0.836 | 0.834 
1.5 0.832 | 0.830 | 0.828 | 0.826 | 0.824 | 0.822 | 0.820 | 0.818 | 0.816 | 0.814 
1.6 0.811 | 0.809 | 0.807 | 0.805 | 0.803 | 0.801 | 0.799 | 0.797 | 0.795 | 0.792 
i 0.790 | 0.788 | 0.786 | 0.784 | 0.782 | 0.780 | 0.777 | 0.775 | 0.773 | 0.771 
1.8 0.769 | 0.767 | 0.764 | 0.762 | 0,760 | 0.758 | 0.755 | 0.753 | 0.751 | 0.749 
1.9 0.747 | 0.744 | 0.742 | 0.740 | 0.738 | 0.735 | 0.733 | 0.731 | 0.729 | 0.726 
2.0 0 724 | 0.722 | 0.720 | 0.717 | 0.715 | 0.713 | 0.710 | 0.708 | 0.706 | 0.704 
2.1 0.701 | 0.699 | 0.697 | 0.694 | 0.692 | 0.690 | 0.687 | 0.685 | 0.683 | 0.681 
2.2 0.678 | 0.676 | 0.674 | 0.671 | 0.669 | 0.667 | 0.664 | 0.662 | 0.660 | 0.657 
2.3 0.655 | 0.653 | 0.651 | 0.648 | 0.646 | 0.644 | 0.641 | 0.639 | 0.637 | 0.634 
2.4 0.632 | 0.630 | 0.627 | 0.625 | 0.623 | 0.620 | 0.618 | 0.616 | 0.614 | 0.611 
2.5 0.609 | 0.607 | 0.604 | 0.602 | 0.600 | 0.597 | 0.595 | 0.593 | 0.590 | 0.588 
2.6 0.586 | 0.584 | 0.581 | 0.579 | 0.577 | 0.574 | 0.572 | 0.570 | 0.568 | 0.565 
2.7 0.563 | 0.561 | 0.559 | 0.556 | 0.554 | 0.552 | 0.549 | 0.547 | 0.545 | 0.543 
2.8 0.540 | 0.538 | 0.536 | 0.534 | 0.532 | 0.529 | 0.527 | 0.525 | 0.523 | 0.520 
2.9 0.518 | 0.516 | 0.514 | 0.542 | 0.509 | 0.507 | 0.505 | 0.503 | 0.501 | 0.498 
3.0 0.496 | 0.494 | 0.492 | 0.490 | 0.488 | 0.485 | 0.483 | 0.481 | 0.479 | 0.477 
3.1 0.475 | 0.473 | 0.470 | 0.:68 | 0.466 | 0.464 | 0.462 | 0.460 | 0.458 | 0.456 
3.2 0.454 | 0.452 | 0.449 | 0.447 | 0.445 | 0.443 | 0.441 | 0.439 | 0.437 | 0.435 
3.3 0.433 | 0.431 | 0.429 | 0 427 | 0.425 | 0.423 | 0.421 | 0.419 | 0.417 | 0.415 
3.4 0.413 | 0.411 | 0.409 | 0.407 | 0.405 | 0.403 | 0.401 | 0.399 | 0.397 | 0.395 
3.5 0.393 | 0.391 | 0.389 | 0.388 | 0.386 | 0.384 | 0.382 | 0.380 | 0.378 | 0.376 
3.6 0.374 | 0.372 | 0.371 | 0.369 | 0.367 | 0.365 | 0.363 | 0.361 | 0.359 | 0.358 
3.7 0.356 | 0.354 | 0.352 | 0.350 | 0.349 | 0.347 | 0.345 | 0.343 | 0.342 | 0.340 
3.8 0.338 | 0.336 | 0.334 | 0.333 | 0.331 | 0.329 | 0.328 | 0.326 | 0.324 | 0.322 
3.9 0.321 | 0.319 | 0.317 | 0.316 | 0.314 | 0.312 | 0.311 | 0.309 | 0.307 | 0.306 
4.0 0.304 | 0.302 | 0.301 | 0.299 | 0.298 | 0.296 | 0.294 | 0.293 | 0.291 | 0.290 
4.1 0.288 | 0.286 | 0.285 | 0.283 | 0.282 | 0.280 | 0.279 | 0.277 | 0.276 | 0.274 
4.2 0.273 | 0.271 | 0.270 | 0.268 | 0.267 | 0.265 | 0.264 | 0.262 | 0.261 | 0.259 
4.3 0.258 | 0.256 | 0.255 | 0.254 | 0.252 | 0.251 | 0.249 | 0.248 | 0.246 | 0.245 
4.4 0.244 | 0.242 | 0.241 | 0.239 | 0.238 | 0.237 | 0.235 | 0.234 | 0.233 | 0.231 
4.5 0.230 | 0.229 | 0.227 | 0.226 | 0.225 | 0.223 | 0.222 | .0.221 | 0.220 | 0.218 
4.6 0.217 | 0.216 | 0.215 | 0.213 | 0.212 | 0.211 | 0.210 | 0.208 | 0.207 | 0.206 
4.7 0.205 | 0.203 | 0.202 | 0.201 | 0.200 | 0.199 | 0.197 | 0.196 | 0.195 | 0.194 
4.8 0.193 | 0.192 | 0.190 | 0.189 | 0.188 | 0.187 | 0.186 | 0.185 | 0.184 | 0.183 
4.9 0.181 | 0.180 | 0.179 | 0.178 | 0.177 | 0.176 | 0.175 | 0.174 | 0.173 | 0.172 


Abridged from “An 8 place Table of the Einstein Functions” by J. Sherman and R. B. Ewell, 
The Journal of Physical Chemistry, June 1942 by kind permission of The Williams& Wilkins Company, 
Baltimore. 
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Experimental Verification of the Hypothesis 


The extent to which the basic hypothesis fits the experimental facts 
is shown by the figures in Table II which gives the pooled values of x’ 
for several experiments on these different micro-organisms. The indi- 
vidual responses are plotted in figs. 2a, b, c; the doses have been expressed 
as multiples of the ED,o’s, and the lines have the theoretical slope of 
In 0.5. It should be noted that in the case of B. anthracis the experiments 
were not replicates since the particle size of the cloud was made to vary 
considerably from experiment to experiment, so that individual regres- 
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FIG. 1. 


Relationship between dose of B. anthracis spores and In proportion surviving of guinea pigs, showing 
regression line (fitted by eye). 
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sion lines were obtained in widely different positions. The 8 experiments 
on B. typhosum were picked at random from 56 similar ones. 


TABLE II. VARIOUS RELATIONSHIPS FITTING ONE FIXED LINE 


Route of | No. of Degr. 
Organism Test Animals Infection | Expts.| x? | of fr. | Graph 
Brucella suis (4) | 300 Guinea pigs | Respiratory 4 3.2 6 | fig. 2a 
B. anthracis (3) | 990 Guinea pigs | Respiratory 5 25.9 | 22 | fig. 2b 
B. typhosum (1) | 640 Mice Intraperi- 
toneally 8 19.8 | 14 | fig. 2c 
0 I 2 3 
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FIG. 2a, 


Fixed regression line in relation to the result of 4 experiments with Brucella suis; doses of the sub groupe 
are expressed in multiples f of the EDse of each individual experiment. 


> 

i 


328 


BIOMETRICS, SEPTEMBER 1953 


Estimation of Relative Potency 


If different strains of the same pathogenic organism are to be com- 
pared as to their virulence with respect to a host, one dose-response line 
is fitted for each strain and the ratio of the slopes estimates the relative 
potency. For let p, and p, be the slopes of “standard” and “unknown” 
respectively and n, and n, be doses producing a common response S; it 
follows from equation (la) that n,.p, = np, or 


Relative Potency R = ™ = ?: 
Nu D. 
° ; Dose 
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FIG. 2b. 


Fixed regression line in relation to the result of 5 experiments with Bounce as 
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Since Var (p,) and Var (p,) have been computed already together with p, 


and p, , fiducial limits to R are readily assigned as 7 
R + + (1 — g) Var p./Var p,] 


where g = ¢’ Var p,/p; and ¢ is the normal deviate for the level of prob- 
ability to be used (5). 


Comparison with probit analysis 


This Department used to apply the methods of probit analysis to 
microbiological dose-response relationships (6). When for example ani- 
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mals were exposed to pathogens, a (log,. dose) — (probit killed) regres- 
sion equation has fitted well the results in most cases. If the hypothesis 
expressed in this paper holds good, i.e. if In S = —pn (equation 1a) 
actually applies, all the probit lines computed so far ought to show about 
the same slope within experimental error (Examples are quoted in Ref. 
2). For log,o dose — probit killed transforms an ideal dose — In propor- 
tion surviving into a slightly bent curve (see fig. 3) and it is easily proved 
that, if Y is the Probit, the slope dY/d logy. n at the EDs. equals 2 
(very nearly) 


5.0) 


PROBIT KILL 
& 


(10 X multiple-ED,o) 


0.6 07 os 0.9 1.0 1 1.2 1.3 


FIG. 3. 
An ideal dose — In proportion surviving relationship plotted as probits against log dose. The slope 
at the EDs is 2. 
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-Sp (B) and 


———— = n log, 10 (C); 


Substituting in the product of (A) x (B) X (C) for Y = 5, for S = 0.5 
and for pn = —In 0.5 gives 2.0003. 
It may be concluded that 


(i) the two models differ only slightly and could not be experiment- 
ally distinguished without forbidding expenditure in test ani- 
mals, 

(ii) that the calculated ED;. should be on the whole lower when 
probit analysis is used (see Table III), 

(iii) the present method compares favourably with probit analysis as 
far as computation labour is concerned. 


Examples from practice have been worked out by both methods and 
the results are given in Tables III and IIIA. The estimates of relative 
potencies provided by the two techniques are in good agreement through- 
out; the probit method appears in general to give slightly narrower 
fiducial belts but the point at issue, of course, is not which method gives 


more apparent precision but which one is based on the more appropriate 
mathematical model. 


TABLE III. COMPARISON OF TWO METHODS OF ANALYSIS 


probit analysis 

present method 

x?/df. EDs and its 95% fid. Its. 

Organism 

A|B A B 
B.anihracis, 1, particles*|0.72|0.44) 0.24 > 0.34 0.43 | 0.29 + 0.36 0.47 
B.anthracis, particles |1.33|0.94| 0.29 0.36 0.44 | 0.30 > 0.36 — 0.47 
B.anthracis, 4.5u particles |1.16|1.08| 0.40 0.51 0.70 | 0.44 + 0.53 0.68 
B.anthracis, 8p particles |1.46)1.51] 3.8— 4.0—- 5.1 
B.anthracis, 12 particles |0.10)1.28) 4.0—> 5.3— 7.1] 5.1—- 6.0—- 7.3 
Brucella suis 0.06/0.77| 6.9 + 31.7 — 52.7 | 34.0 — 45.3 — 68.0 
Brucella suis 0.88/0.67) 25.3 — 35.3 — 57.0 | 25.5 > 35.0 — 55.9 
Brucella suis 0.26/0.05) 21.0 — 34.8 — 47.9 | 26.0 — 36.1 — 57.0 
Brucella suis 0.44/0.41) 19.0 + 31.8 44.1 | 24.1 + 32.7 51.0 


*See Table I. 
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TABLE IIIA. POTENCIES OF 2 MUTANTS OF Ty2—STRAIN OF B. Typhosum RELATIVE 
TO PARENT STRAIN. 2 DIFFERENT EXPERIMENTS. RESULTS OBTAINED BY THE 
TWO METHODS 


x?/df. Rel. Potency and its 95% fid. Its. 


Expt. Organism 
A|B A B 


MVT. 75 | Ty2 1.96)3.04 
Mutant a |0.27/0.23) 2.34 -3.39 —4.90 | 2.17 -3.43 -—5.78 
Mutant b |0.19/1.60) 0.73 -1.05 -1.50 | 0.58 -1.01 —1.70 
MVT. 85 | 0.95)0.56 
Mutant a |0.24/1.27| 0.02 —0.029-0.042 | 
Mutant b |0.27/1.47| 0.047--0.069--0.099 | 0.057--0.075-—0.110 
Choice of doses 


When experimental evidence has shown that the dose — In proportion 
surviving regression line holds good for a certain host-parasite relation- 
ship, the question arises: What is the most economical investment of test 
animals? In other words: Which dose n will minimise the variance of the 
regression coefficient p? 

Using equation (4) for one point: 


dp (e"" — 1)’ 
and putting (m — r) equal to its expected value m(1 — e~”") we obtain 
—1 e" — 1 
Var (p) = (7) 
dp” 
and hence the Invariance of p can be expressed in the form 
1 m (In 


s 


Differentiating equation (7) w.r. to n and putting 


d(Var p) _ 0 
dn 


the equation is simplified to 
— pn) —2=0. 
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Hence Var (p) is a minimum for 


pn = 1.5936 (9) 

which corresponds to 
e** = 20.3% survivors (9a) 
Inv (P) max = X 0.6476 (9b) 


In fig. 4 equation (7a) is plotted as a percentage of its maximum 
value (equation 9b). It will be noted that the efficiency within the range 
of 10%-35% survivors is hardly affected; in practice the experimentor 
would aim at a moderately low survivor rate and know that, if he misses 
within reason the theoretical 20.3% optimum (equation 9a), economy 
would still not be appreciably impaired. 
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FIG. 4. 
Relative Invariance of an estimate of p based on a single dose-level as a function of the proportion 
surviving. 


If little is known about p and even linearity is doubtful, obviously 
more than one point is needed; but high survivor rates which carry little 
weight will be avoided, if possible. 


Application to dilution series 


After this paper had been drafted the close analogy with the analysis 
of dilution series was pointed out to me by Mr. M. J. R. Healy. Using 
the notation of this paper the relevant problem can be presented in the 
following rather condensed form. (For details references (7), (8) may be 
consulted.) 
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A series of k dilutions with dilution factors n, , n., +--+ , mis prepared 
from a bacterial suspension whose density of p organisms per unit volume 
is to be estimated, m, , m, , --- , m, samples respectively are incubated 
and the resulting sterile samples are observed to number 7, , f2 , *** 5 Tr 
respectively. It can be shown that under certain assumptions the ex- 
pected proportion of sterile samples S = e™ (7). 

Obviously the present method should be able to deal with the prob- 
lem. Fisher and Yates (8) give an example (the tests of a potato flour 
for the estimation of the density of spores of B: mesentericus) to illustrate 
Fisher’s approximative method which Finney also used to elucidate his 
method (7). 

Table IV gives the estimated density of organisms per gram with its 
95% limits by the three techniques. 


TABLE IV. 
Density 
(organisms /gram) 95% limits 
Fisher 760 407-1440 
Finney 766 429-1370 
Present method 766 321-1211 


Summary 


The mode of action of pathogenic micro-organisms is explained by a 
hypothesis (first given by H. A. Druett) which leads to a linear relation 
between dose and log proportion surviving; its maximum likelihood solu- 
tion is given and tables are provided to help computation. When doses 
are expressed in terms of the ED,» a fixed regression line to fit all patho- 
genic relationships emerges from the underlying hypothesis. Illustrative 
examples taken from practice are quoted. Comparison with Probit 
analysis is discussed. The problem of economical use of test animals is 
treated. The method can be applied to dilution series. 


Acknowledgements 


It is a pleasure to thank Mr. E. C. Fieller, Ministry of Supply, for _ 
encouragement and helpful suggestions while this paper was being pre- 
pared. I am indebted to Mr. R. Ash for helping with the necessary 
computations. Acknowledgement is made to the Chief Scientist, Minis- 
try of Supply, for permission to publish this paper. 


INVASION OF MICRO-ORGANISMS 335 


REFERENCES 


(1) Bacon, G. A., Burrows, T. W. and Yates, M., The Effects of Biochemical Muta- 
tion on the Virulence of Bacterium Typhosum. Brit. J. of Exp. Path. 31: 714, 
1950 and 32: 85, 1951. 

(2) Druett, H. A., Bacterial Invasion, Nature 170: 288, 1952. 

(3) Druett, H. A., Henderson, D. W., et al. in preparation. 

(4) Elberg, S. S. and Henderson, D. W., Respiratory Pathogenicity of Brucella. 
J. of Inf. Dis. 82: 302, 1948. 

(5) Fieller, E. C., A Fundamental Formula in the Statistics of Biological Assay, and 
some Applications. Quart. J. Pharm. Pharmacol., 17: 117, 1944. 

(6) Finney, D. J., Probit Analysis, Cambridge Univ. Press, 2nd Edn., 1952. 

(7) Finney, D. J., Statistical Method in Biological Assay (§21.5) London: Charles 
Griffin & Co. 

(8) Fisher, R. A. & Yates, F., Statistical Tables for Biological, Agricultural and Medical 
Research, Edinburgh: Oliver & Boyd. 


Correction to “II. Some Statistical Problems in Measuring the Subjec- 
tive Response to Drugs” by Frederick Mosteller, Biometrics, Vol. 8, 
No. 3, 1952, p. 220-231. In this article McNemar’s formula for testing 
for changes in a 2 X 2 table is discussed. The reader was referred to 
Cochran’s article (The comparison of percentages in matched samples, 
Biometrika, Vol. 37 (1950), pp. 256-266) for a discussion of related and 
more complicated problems. However, it was erroneously stated that 
Cochran’s description of the test does not contain the correction term. 
In the text immediately following the formula, Cochran gives the method 
for correcting for continuity. 


Errata: “Design and Analysis of Triangular Singly Linked Blocks” by 
K. R. Nair. Biometrics, Vol. 9, No. 2, 1953, p. 147. In the paragraph 
just below Table 3, “‘inter-block” should be changed to “intra-block.”’ 


Errata: Abstract 231, Biometrics, Vol. 9, No. 2, 1953, p. 267. In the 
title of the abstract by J. A. Fraser Roberts, the word ‘‘Independent”’ 
should be changed to “Dependent.” 
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ANALYSIS OF EGG SHAPE OF CHICKENS 
Frep T. SHuLtTz 


University of California 
Berkeley, California 


The principles of population genetics as formulated by Wright (1921) 
elucidated by Lush (1945), and applied to poultry by Lerner (1950), 
provide the basis for an investigation into the relative contributions of 
heredity and environment to egg shape. The purpose of this study is 
twofold: (1) from the practical standpoint, a knowledge of the herit- 
ability of egg shape and the correlation of egg shape with characters 
under selection, particularly egg weight, may be useful in breeding 
programs; (2) from the theoretical (and ultimately practical) standpoint, 
egg shape is a convenient character for studies on the relationship of 
environmental factors to balanced genetic systems.* 

The definition of egg shape used here is 100 times the maximum 
breadth divided by the greatest length. This shape index was proposed 
by Pearl and Surface (1914) and is the one used by Marble (1943) who 
postulated the polygenic nature of this trait. 


MATERIALS 


The populations used in this study consist of the production-bred 
flock (Lerner, 1950) and eight single sire inbred lines (Shultz, 1953) 
maintained at the University of California. Measurements were ob- 
tained from 498 pullets produced from 12 sires and 117 dams of the 
production-bred flock. The eight inbred lines were derived from the 
production-bred flock in 1944. Two lines each were selected for high 
(HNP, HNM) and low (LNP, LNM) November egg number and high 
(HWP, HWM) and low (LWP, LWM) November egg weight. Data 
from reciprocal crosses between lines selected similarly (e.g. HNP X 
HNM) are also analyzed. Measurements were obtained for a total of 
134 pullets from these lines and 154 pullets from the crosses between 
lines. All pullets studied were hatched in 1950. 

The first and second egg from each of two clutches from each pullet 
were measured during March and April, 1951. Double-yolked eggs, 
wind eggs, rough shelled eggs, and abnormally shaped eggs were not 
considered. Measurements of length and breadth were made with 


*This paper is ore of a group entitled Inbreeding and balanced genetic systems in the chicken on file 
as a Ph.D. thesis in the library of the University of California, Berkeley. For other published material 
from this thesis, see Shultz and Briles (1953) and Shul.z (1953). 
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micrometers, readings being taken to the nearest hundredth of a milli- 
meter. 
PRODUCTION-BRED FLOCK 


Variance analysis of positions combined. The variance components 
to be considered are S (between sires), D (between dams within sires), 
P (between pullets within dams), U (between clutches within pullets), 
Q (between the two positions of a clutch), and Jgs , Jgn , Jqp , and Igy 
(interactions of positions with sires, with dams within sires, with pullets 
within dams, and with clutches within pullets, respectively). The com- 
ponent Jgy should more properly be considered an error term. The two 
forms of analysis of variance used for computations and their relation- 
ship to the third form of partition of variance from which the various 
components of mean squares can be obtained are shown in figure 1 


positions 1 1 Between positions) 


Between sires u 22 Between sires 
within positiond 
Sire-position interaction n 


Between dams 105: 210 Between dams 
within sires an within sires 
position interaction 105 


within sires 


Between pullets B1 762 Between pullets 
within dams an within dams 
Pullet-position interaction 361 


within dams 


Between clutches 96 996 Between clutches 
within pullets — within pullets 
Between positions Error (clutch-position 


“ithin clutches interaction within 
pullets) 
FIGURE 1. 
RELATIONSHIPS BETWEEN FORMS OF ANALYSIS OF VARIANCE. FORM III IS DE- 
RIVED BY SUBTRACTIONS OF SUMS OF SQUARES OF FORM I FROM FORM II. NUM- 
BERS REFER TO DEGREES OF FREEDOM. 


Between sires u 


Since there are equal numbers of eggs from each clutch position within 
each of the other subclases (sire, dam, or pullet) unbiased estimates of 
the interaction terms between positions and the other sources of varia- 
tion can be obtained. 

The mean squares of Form III of the analysis of variance, their com- 
position in terms of components, and the tests of significance of the 
various interactions are presented in table 1. The composition of mean 
squares indicates which comparisons should be made for the F test. 
For example, the sire-position interaction mean square should be com- 
pared to the dam-position interaction in the test of significance of Igs . 


Between dams 105 = 
within sires : 
Between pullets 
within dans 
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None of the interaction terms are significant nor do their relative magni- 
tudes suggest that there is any interaction present. 

In the absence of interaction, all interaction terms may be combined 
to give a new error term to be used in the test of significance of the be- 
tween positions and between clutches mean squares (table 1). A sig- 
nificant difference (.01 level) between positions and between clutches 
was found. 

The mean of the first position (73.55) was significantly lower than the 
mean of the second position (73.87). This is in agreement with the data 
of Marble (1943) in which the first egg of the clutch was almost always 
longer and narrower than subsequent eggs in the same clutch. 

The significant difference between clutches suggests a seasonal trend 
in egg shape. This was found to be the case, the mean index of second 
clutch eggs (73.96) being higher than that of first clutch eggs (73.48). 
The direction of the difference was consistent for all 12 of the sire families 
taken separately. Lerner (unpublished) found a significant difference 
between eggs laid in the fall and eggs laid in the spring in a group of 
24 SCW Leghorn pullets from a different flock. Benjamin (1920) also 
found a significant seasonal trend in SCWL chickens, the index reaching 
a peak in the fifth month of production and then declining. Marble 
(1943), on the other hand, failed to find any indications of a seasonal 
trend in a flock of Barred Plymouth Rocks. 

The heritability of the shape index on the basis of a single egg per 
pullet, assuming that the egg was selected at random from among the 
four eggs measured, was estimated at .124 (table 5). The heritability 
on the basis of four eggs per pullet as measured here and on the theo- 
retical basis of an infinite number of first and second position eggs was 
estimated at .167 and .189 respectively. 

Variance analysis of positions separately. The lack of interactions 
between positions and other sources of variance allows considerable 
simplification in the analysis of variance. Considering each position 
separately, the form of the analysis of variance. and the composition of 
mean squares becomes that shown in table 2. The interpretations of 
these components are given in table 3 in which o%, is the genetic vari- 
ance due to autosomal genes, o¢, , is the genetic variance (in the hetero- 
gametic females) due to sex-linked genes, o¢ is the variance due to 
maternal effects, o7 is the variance due to environmental effects common 
to all eggs of the same position of an individual bird, of is the variance 
due to environmental causes peculiar to each egg, and ri’, rie" 
are the autosomal and sex-linked genetic correlations for full and half- 
sibs. The values of the variance components are presented in table 4. 

The analytical design (table 3) will not allow the disentangling of all 
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TABLE 2 


Form of analysis of variance for clutch-positions separately and the covariance 
between clutch-positions 


Source of variation df. Composition of mean square 
(or covariance) 


Between sires U+%,P + 28 
Between dams within sires 105 U + ’,P + iaD 
Between ‘pullets within dams 381 U + 1,P 

Between clutches within pullets 498 U 


of the various genetic and environmental variances. Although the 
effects due to sex-linkage would be expected to be small (see discussion 
by Lerner, 1950) the sire component exceeded the dam component for 
each of the two positions suggesting that some sex-linkage does exist 
and that maternal effects are small or absent. Consequently, the best 
estimate of the total genetic variance appears to be twice the sum of the 
sire and dam components or 2(S + L). The enrivonmental variance 
designated of can best be estimated by the difference between the pullet 
component and the sum of the sire and dam components or P — (S + D). 
The estimate of the genetic variance will contain twice the maternal 
effect whereas the estimate of of will be too small by the magnitude of 
the maternal effect. Since the maternal effect is small if present at all, 
estimates obtained in the above manner should not be in serious error 
from this cause. 

The relationships between the various causes of variation are shown 
by the path diagram for one position in figure 2. The genotypes of the 
sire and dam (Gs and Gp respectively) each contribute to the genotype 
of the pullet (Gp), r, and r{ being the correlations between full and 
half-sibs respectively. The genotype of a pullet and the environment 
(EZ) contribute to what may be called the potential capacity of the indi- 
vidual to produce eggs of a particular shape (K). In turn, this constitu- 
tion and additional environmental factors (E) determine the final shape 
of the egg (Z). The E portion of the environment represents factors 
varying in their effects from time to time during the period of lay includ- 
ing errors of measurement. The phenotypic correlations between eggs 
are designated by ¢. If more than one egg of the same position from the 
same pullet is measured (in this case 2), the shape of each egg will con- 
tribute to a mean phenotype (Z) for that bird. 

Heritability is the ratio of the additive genetic variation to the total 
variation. Hence, a decrease in the environmental portion of the total 
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TABLE 3 


Interpretations of components of variance and covariance analyses. Substitutions in 
general form made for females only. F is Wrights (1921) coefficient of inbreeding of 
the generation analysed. F’ is the coefficient of inbreeding of the parents. Formulas 
for genetic correlations under inbreeding (including those for males) are also given. 
For single sire lines, lines and sires become synonymous and r®, equals zero. 


A—autosomal. SL—sex-linked. F,4 does not equal F gy . 


Non-inbred population 
Compo- General form* 
nent Variance Covariance 
L + F,)oG. + + Fsio’Gsi 
Ca 2 
+ + Fsidots, 
Ga _ 2 
D rhe + doi, + 1COVo, + COV. 
too, + debs, + + 1COVes, + COV 
U of ob COV. 
2 
Total (1 + COVe, + COVes, + COVe 
+ (1+ + oc + o% + 0% + COV, +COVe 
Inbred lines 
2F. 1 + + 6F. 1+ Fi + 2F. 
osomal, males, and females i+ 40 + Fa) 2 + Fa) 
1 + Fs. + OF 3+ Fs: + 4F 
ed, maies Tr 4(1 Th 4(1 + 
Single sire line crosses 
Autosomal, males, and females: i+ F,’ Tae Tae 2 


*Covariance follows this same form. 

**For sex linkage, the genetic variance as measured 1n the heterogametic sex is not increased by 
inbreeding. Therefore the term (1+ Fs ,) given in the general formula becomes 1 in the case of females. 
***Correlations for non-inbred populations as they apply to poultry are given by Lerner (1950, 
table 9). However, a mistake was made in the derivation of Lerner’s table from that given by Lush 
(1945) for mammals. The correct values for the last three lines in Lerner’s table dealing with paternal 
half sibs relationships should read 0.50, 0.25, and 0.35 for the case in which all genes are sex-linked and 

268, .250, and .256 for the case in which five per cent of the genes are sex-linked. 
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TABLE 3—Continued 


Compo- Single sire inbred lines** Line crosses 
nent (Genetic variance only) (Genetic variance only) 
L ies 
14+F4+6F, 2. 2 1+F4+2F 2 
4 [Ga 2 4 
D 1+F4,—2F, » 1+F4—2F 2 
U 
Total (1+ 


variance will result in an increase in the heritability. Such a decrease in 
the effects of the daily environment (Z) can be obtained by taking more 
than one measurement of egg shape of a pullet. This leads to the possi- 
bility of three estimates of heritability: 

h? pertains to the single egg. This determination, in which all of 
the effects of the transient environment (EZ) are present, is 
analogous to estimates of characters for which only one measure- 
ment is possible or commonly taken (e.g. egg number, body size). 

h? depends on the number (n) of eggs measured per pullet. The 
greater the number of eggs measured the higher will be this 
estimate assuming that the components of variance remain con- 
stant over the longer period of time required to obtain the addi- 
tional measurements, an assumption not necessarily warranted. 
This estimate is analogous to those for egg weight, several eggs 
from each bird commonly being weighed. 

h’ constituted the limiting case in which n is assumed to be infinity. 

The estimates of the various heritabilities and environmental effects 
are given in table 5. 

Covariance analysis. Because a significant difference between posi- 
tions was found (table 1) and since the causes of variation appeared to 
be of somewhat different magnitudes for the two positions, a covariance 
analysis was carried out to determine the genetic and environmental 
correlations. The form of the covariance analysis follows that of the 
variance analysis (table 3). The relationships between the causes of 
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TABLE 5 


Estimates of heritabilities and environmental effects for the production-bred flock 


Effect Formula Position 1 | Position 2 Both 
Positions 
é 369 333 
S+D+P+U 
2 P—(S+ D) 
D+P .753 839 
2 2(S + D) 
h S4DiP +P .247 161 .189 
2 Z(S + D) 
hz .191* .129* .167** 
2 Z(S + D) 
hz S+D+P+U .156 .108 .124 
*Two eggs 


Four eggs, two from each position. 


variation and their effects within the individual are shown in figure 3. 
The paths for each position are the same as those shown in figure 2, the 
more complicated arrangement being due to the necessity of representing 
both positions of a clutch on the same diagram. 

The values of the correlations r°, r“, and r* were found to be .909, 
1.013, and .162 respectively (table 6). The phenotypic correlation was 
.697. 


INBRED LINES 


Effects of selection. There appears to be no relationship between 
mean egg shape and the direction of selection for egg weight or egg 
number in the inbred lines (table 7). These lines became differentiated 
with respect to egg weight as a result of direct selection for egg weight in 
the HW and LW lines and as a result of correlated response of egg weight 
to selection for egg number in the HN and LN lines, the descending order 
being HW, LN, HN, and LW (Shultz, 1953). The order with respect to 
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egg shape is HW, LW, LN, and HN with the individual lines being 
mingled. Other investigators have also failed to find evidence of a corre- 
lation between egg weight and egg shape (Pearl and Surface, 1914; 
Benjamin, 1920; Marble, 1943). 

Analysis of variance and covariance. Since no interaction between 
positions and other sources of variation was found in the production-bred 
flock, the analysis of variance of positions separately and the covariance 
analysis only were undertaken in the inbred lines. It should be noted 
that these lines are single sire lines and hence the between sire com- 
ponent also includes the difference between lines. The sire component 
in the case of crosses represents a pooling of the differences between 
reciprocal line crosses with diffe:2nces between unrelated crosses. The 
values of the various components are given in table 4. 

Interpretation of components. The interpretation of the components 
of variance obtained from inbred lines is different from that of larger 
random bred populations, such as the production-bred flock, due to the 
effect of random fluctuation of gene frequencies on the genetic variance 
(Dickerson, 1942). If there is no selection, then the proportion of the 
genetic variance contained in each of the components depends on the 


TABLE 6 


Correlations between first and second clutch positions. Components of covariances 
designated by subscript 12. Components of variance designated by subscript 1 or 2 
for first and second positions respectively. 


Prod- 


Corre- In- 
Formula Bred CTOsses 
Flock 
0.909 1.049 0.874 
V(Si+D,)(S2+ Dz) 
Dis 1.013 0.952 1.578 
D2) 
0.162 0.097 0.020 
SutDutPrs 0.997 0.995 0.935 
V (Si 
St Us 0.697 0.604 0.550 


Us) 


} 
4 
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Mean shape indexes ee of inbred lines and line crosses. In crosses, sire’s 


BIOMETRICS, SEPTEMBER 1953 


TABLE 7 


100 breadth 


line is given first. 


Values for the between pullet component of variance, P, for November egg weight 


Line or Cross Number of pullets Position 1 Position 2 
HNP 7 72.45 71.50 
HNM 10 70.84 73.02 
LNP 33 74.80 74.91 
LNM 26 72.20 73.36 
HWP 15 75.51 75.23 
HWM 14 74.92 74.83 
LWP 26 75.11 75.40 
LWM 3 72.83 75.95 
Weighted mean 73.99 74.40 
Unweighted mean 134 73.58 74.28 
HNP X HNM 25 72.33 71.84 
HNM X HNP 14 72.30 72.30 
LNP X LNM 17 73.93 74.32 
LNM X LNP 9 71.28 71.94 
HWP X HWM 22 75.45 76.83 
HWM X HWP 20 74.72 74.84 
LWP X LWM 9 75.02 75.76 
LWM X LWP 38 74.14 74.88 
Weighted mean 73.80 74.25 
Unweighted mean 154 73.65 74.09 
Prod-bred flock 498 73.55 73.87 

TABLE 8 


and shank length in the 1950 generation inbreds and crosses. 


Number November egg weight Shank length 
of Lines selected for 
lines Inbreds | Crosses | Inbreds | Crosses 
2 High November egg weight | 14.211 13.905 . 1000 .0861 
4 High or low November egg 
number 10.784 7.860 .0844 .0463 
2 Low November egg weight 8.347 4.610 .0589 .0351 
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FIGURE 2. 


PATH DIAGRAM FOR .ONE CLUTCH POSITION. THERE WAS NO EVIDENCE OF 
MATERNAL EFFECTS (C). EXPLANATION OF SYMBOLS IN TEXT. 


coefficients of inbreeding of the pullets (F) and of their parents (F’). 
Following Wright’s (1921) methods of path coefficients, the genetic corre- 
lations due to autosomal and sex-linked genes were found to be as given 
in table 3. 

In this table, Fs, and F's; , applying to sex-linked genes, are defined 
as the correlation between gametes each of which contain a sex chromo- 
some. Hence, F's, is the inbreeding coefficient of the homogametic 
parents. It should be noted that the expected rate of increase in F will 
be different for autosomal and sex-linked genes (Crow and Roberts, 
1950) so that F, is not equal to F's, and F{ is not equal to Fs, . If 
selection, natural or artificial, is acting to retard the increase in homo- 
zygosity due to inbreeding, the coefficients, F and F’, calculated on the 
basis of relationship will be overestimated. 

Under the assumption of additive gene action, the subdivision of a 
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FIGURE 3. 


PATH DIAGRAM SHOWING RELATIONSHIPS BETWEEN CLUTCH POSITIONS (1 AND 2) 
WITHIN CLUTCHES WITHIN PULLETS. EXPLANATION OF SYMBOLS IN TEXT. 


population into a number of isolated lines (such as was done in effect 
in this material) will theoretically result in an increase in the total genetic 
variance by the factor 1 + F, for autosomal genes and 1 + Fs, for 
sex-linked genes (homogametic sex). No increase will occur with respect 
to sex-linked genes in the heterogametic sex although the distribution of 
the genetic variance within and between lines will change. Considering 
only the homogametic sex, the genetic variance between lines increases 
as inbreeding occurs as illustrated by the expression 


1+Fi+6F, 1+ Fs, 2 
4 CGa + 2 
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for component S in single sire lines. The genetic variance within single 
sire lines decreases in accordance with the expression 


3 — Fi — 2F, 2 
4 


Changes in the genetic variance due to random fluctuations of gene fre- 
quencies resulting in the differentiation of lines and an increase in genetic 
variance between lines has been called genetic drift, the possible evolu- 
tionary significance of which has been particularly elaborated by Wright 
(1931). These formulas (table 3) furnish a measure of the genetic drift 
expected in the absence of selection. However, if gene action is not 
additive, i.e. with dominance, the genetic variance would not behave in 
the above manner (Robertson, 1952). 

Nothing is known about the effect of inbreeding on the magnitudes of 
the other causes of variance. In this material, maternal effects appar- 
ently are not present as shown by the analysis of the production-bred 
flock and hence can be ignored here. The magnitude of of can be ob- 
tained directly and comparisons made between the production-bred 
flock, the inbred lines, and line crosses. An estimate of of free from any 
effects of selection on the genetic variance cannot be obtained directly. 
However, the amount of genetic variance contained in the component P 
is the same for both inbreds and crosses, being dependent in both cases 
on the inbreeding of the parents only (table 3). Since the same sires 
were used in producing both inbreds and crosses, comparisons of the 
respective P components should give a good estimate of the relative 
effects of the environment E on inbreds and crosses. 

Results. The inbred lines and the crosses between them show evi- 
dence of sex linkage. Thus the sire component is much larger than the 
dam component (table 4) the difference between them being greater than 
would be expected from the action of inbreeding on the genetic variance. 
This is in agreement with the findings in the production-bred flock dis- 
cussed previously. The means of the crosses also indicate sex linkage, 
the order of reciprocal crosses between two inbred lines following the 
order of the sires’ lines in all cases except for the second position in the 
crosses between LWP and LWM. 

The differences between sires from different inbred lines is greater 
than the difference between sires in the relatively non-inbred production- 
bred flock (table 4). This divergence of the inbred lines must have been 
due to genetic drift since there appears to be no relationship between 
artificial selection as practiced in these lines and egg shape. 

The difference between dams, contrary to expectations, is larger in 
the inbred lines than in the production-bred flock and line crosses 
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although still of rather small magnitude. This appears to be a reflection 
of sampling errors and the small amount of autosomal genetic variance. 

The difference between pullets is less within inbred lines and their 
crosses than in the production-bred flock as expected on the assumption 
of genetic drift. However, P from the inbreds is greater than P from the 
crosses. Since the expected portion of the genetic variance contained in 
this component is the same in each case, the difference appears to be due 
to a greater effect of the environment (£ in figures 2 and 3) on the inbreds 
than on the crosses. 

The component U is largest in the inbreds, somewhat smaller in the 
production-bred flock, and smallest in the crosses. Since this is a within 
pullet component, and hence due entirely to environmental factors (E in 
figure 2 and 3), it appears that the inbreds are more sensitive and the 
crosses less sensitive to this source of environmental variation than the 
relatively non-inbred production-bred pullets. 


CONCLUSIONS 


The analysis of the production-bred flock brought to light three 
factors which need to be considered in a breeding program for egg shape. 
First, the heritability as determined for this flock is moderately low 
ranging from .11 for a single second position egg to .19 for a mean meas- 
urement of two first position eggs (table 5). These are intermediate 
values such that combined selection would be considerably more effective 


‘than straight family selection or individual selection alone. Second, the 


heritabilities and correlations suggest that the most gains in the index of 
either position would be obtained by selection on the basis of first posi- 
tion eggs alone. However, selection without regard to position is almost | 
as effective and is much more feasible in practical breeding programs. 
Third, the presence of sex linkage in this flock was indicated by the 
analysis of the production-bred flock. The means and the analysis of 
variance of the crosses between inbred lines derived from the production- 
bred flock substantiate this conclusion. If all genes contributing to the 
variability of a character were sex-linked, full and half sister records 
would be of equal value, since females receive a sex chromosome from the 
sire but none from the dam. A progeny test of the females would add 
nothing since the correlation between dam and daughter due to sex- 
linked genes is zero. Information about the genetic worth of a cockerel 
with respect to the sex chromosome received from the sire can be ob- 
tained from his full and half sisters. A progeny test is necessary, to 
evaluate the sex chromosome received from his dam. ‘Thus, under 
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conditions comparable to those found in this flock for egg shape, a full 
scale breeding program designed to alter the mean of a character inher- 
ited in this manner would call for combined individual and full and half 
sister selection of pullets with sister and progeny testing of sires. In the 
case of egg shape where the desirable value is an intermediate one, 
somatic disassortative mating (the mating of phenotypically unlike indi- 
viduals) may be useful for production of commercial stock (Lerner, 
1950). 

Genetic drift (Wright, 1921, 1931) appears to have occurred in the 
inbred lines. Thus the difference between sires (S) from different lines is 
greater than the difference between sires in the relatively non-inbred 
production-bred flock (table 4). The component S is entirely genetic in 
composition (table 3) and is expected to increase as the coefficient of 
inbreeding increases. On the other hand, the difference between dams 
(D) appears to have increased with inbreeding contrary to expectations. 
Thus, both the production-bred flock and the crosses yield small positive 
values for the first position and negative values for the second position, 
whereas the inbred lines show sizable positive values. Whether this is 
the result of sampling errors due to the small number of dams involved or 
whether the genetic variance within inbred lines has actually increased is 
difficult to decide. Robertson (1952) has shown that if there is domi- 
nance, an increase in the genetic variance might be expected. 

The inbreds seem to be more sensitive to environmental conditions 
than are the crosses or the random-bred individuals from the production- 
bred flock. Thus the difference between clutches within pullets, strictly 
an environmental component, is greatest for the inbreds, next largest for 
the production-bred flock, and smallest for the crosses. The between 
pullet component (P) is also useful in this connection if it is assumed that 
the genetic variance has decreased within lines according to the expecta- 
tions given in table 3. The expected genetic composition of the P 
component is the same for inbreds and crosses. Hence, the greater value 
of P for the inbreds than for the crosses must reflect a greater environ- 
mental variance (cz) in the inbreds. The fact that the difference be- 
tween pullets (whether inbreds or crosses) from inbred dams is less than 
the difference between random-bred pullets from the production-bred 
flock would be attributed to a decrease in the genetic variance within 
lines upon inbreeding. This phenomenon was noted in the analysis of 
two other characters (November egg weight and shank length) in these 
same inbreds and crosses, namely, that the difference between pullets 
Was greater in the inbreds than in the crosses (table 8). 
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SUMMARY 


1. An analysis of variance and covariance was carried out for egg shape 
(100 width 
length 
each of 498 pullets in the U. C. production-bred flock, 134 pullets from 
eight inbred lines, and 154 pullets from crosses between these inbred 

lines. 

2. No interactions between clutch positions and other sources of 
variance (sires, dams, pullets) were found. 

3. The shape index for both clutch positions was greater in the second 
clutch than in the first clutch measured indicating that the shape index 
was increasing during the period in which measurements were taken 
(March and April). The difference was significant. 

4. The shape index of the second position was significantly higher 
than the index of the first position. If the distinction between positions 
is disregarded, the consequent reduction in heritability is negligible. 

5. Heritability estimates ranged from .11 on the basis of a single 
second position egg to .19 on the basis of the mean of two first position 
eggs. Heritability for the index of the first position was higher than that 
of the second position. 

6. The estimate of the theoretical maximum heritability (an infinite 
number of eggs measured per individual) was .25 and .16 for the first 
and second positions respectively. 

7. Evidence of sex linkage was obtained from the analysis of the 
production-bred flock and the crosses between inbred lines. 

8. Under the above conditions, improvement of a character (ex- 
pressed only in females) would be made most efficiently by the use of 
combined individual and sister (for sex-linked genes, full and half sister 
records are of equal value) selection of pullets and by the use of sister 
and progeny testing in the selection of males. In the case of sex linkage, 
daughters’ records are of no value in the selection of dams. 

9. The estimates of the components of correlation between first and 
second position indexes of the same bird and clutch in the production- 
bred flock are as follows: r* = 0.162, which is the correlation that would 
result were variation due only to environmental influence contributing 
to differences of individual eggs and different clutches of individual birds; 
r” = 1.013, which is the correlation that would result were variation due 
only to those additional environmental influences contributing to varia- 
tion between birds; and r° = 0.909, which is the correlation that would 
result were variation due only to genetic differences between birds. 

10. The eight inbred lines have become differentiated with respect to 
egg shape by genetic drift. No correlated response of egg shape to selec- 


) using the first and second position eggs of two clutches from 
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tion for November egg weight or egg number as practiced in these lines 
occurred. 

11. Inbreds were found to be more sensitive and the crosses less 
sensitive to environmental effects than the more or less randomly bred 
birds of the production-bred flock. It was also noted that the inbreds 
appeared to be more sensitive to environmental variations than the 
crosses for November egg weight and shank length. 
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A GENERALIZATION OF NEYMAN’S CONTAGIOUS 
DISTRIBUTIONS 


GEOFFREY BEALL AND RIcHARD R. ReEsctIa 
University of Connecticut 


1. Review of experience with Neyman’s contagious distributions. 


Neyman (1939) presented a new class of contagious distributions 
applicable to situations where individuals or items are supposed initially 
dispersed in randomly scattered groups—egg masses in the case of 
insects or clumps in the case of bacteria—which are subject to a chance 
fluctuation in size. It may be supposed that there occurs subsequently 
some spacial dispersion from the initial groups. Under such a scheme, 
the distribution of individuals, or items, cannot be random. Since the 
probability of an individual occurring in a given unit area is related 
to the probability of other individuals occurring in that unit area, the 
distributions may be called contagious. 

Upon the previous general grounds, Neyman has suggested three 
types of theoretical frequency distribution called Types A, B, and C. 
These types all correspond well to observed data for cases where the 
mechanism of dispersion is roughly that previously set forth. They 
proved better than a Poisson distribution and superior to the negative 
binomials, often used under such circumstances. The novel character 
of the new distributions may be seen, in the extreme, from the fact that 
they can be multimodal. This character is, of course, in violent contrast 
to that of the Poisson or negative binomial but yet corresponds very 
well with the biological reality. These conclusions were reached on 
various biological applications, as in the original work by Neyman 
(1939) and in work by one of the writers (Beall 1940) and by Archibald 
(1948). Often in this experience, Types A, B and C fit data progres- 
sively better although sometimes the sequence of types seems not to 
go far enough. For illustration, there is the case of Pyrausta nubilalis 
Hubn. in 1937 (Table VII, Beall 1940), where the types give values 
of x’ with progressively the probabilities of .011, .064 and .110. Asa 
second illustration there is the case of P. nubilalis in the same source 
but which has been reproduced in the present discussion as Table IT. 
It can be seen the values of x” again fall and have progressively the 
probabilities of .05, .48 and .57. 
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2. A general class of contagious distributions including those proposed by 
Neyman. 


The impression that Neyman’s types do not go far enough is of 
interest in view of the fact that their form suggests they are simply 
early members of a great family of distributions, as will be shown 
immediately below. The nature of this family may be seen by starting 
with Types A, B and C as special cases. 

Consider the characteristic functions of Neyman’s types i.e., 


va(t) = exp — 1}} 
ya(t) = exp — 1 — — 1)}/m,e* — 1)} (1) 
ec(t) = exp {2m fe"? — 1 — — 1) 

— m(e"' — 1)*}/m(e" — 1)*} 


' where m, and m, are parameters, of which the numerical value is peculiar 
to the type. Equations (1) suggest* the general form 


(2) 


sit 8 

exp {—m,} exp + 1) 
where the special cases of (1) arise for n = 0,1 and 2. Forn > 2 we 
should have new distributions. For n < 2 and not an integer, we should 
have cases intermediate to Neyman’s A, B and C. 

We note that ¢(¢) of (2) looks like a characteristic function and 
applying the general operation of finding quantities a, from expansion 
into the Maclaurin series of 


ol) = (3) 


r=0 


we note first that 


p> a, = ¢(0) = 1 (4) 
which is a necessary condition for g(t) to be a characteristic function. 
This means, of course, that if ¢(¢) is a characteristic function and the 
quantities a, are the probabilities of 0, 1, 2, etc., observations, then as is 
necessary they total to unity. It is however, further necessary to show 


*An alternate way of arriving at the generalization (2) is suggested in an Appendix to the present 
communication. 


| 
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that a, is nonnegative for all r, which is a more lengthy matter. This 
second condition means, of course, that in a true probabifity distri- 
bution, all the quantities a, , being probabilities, are nonnegative. 

To prove the second point let 


fe") = +) (5) 
whence and from (2), we have the generalized function 
= (6) 
Let us note that 
= (7) 
so that 
d'*'y 
d’-*y 
(k+1) 


where it can be seen that any value a, is expressible in terms of a) 
through a,_, , by means of the recurrent coefficients f“*"(0) /k! 
and so if these are always nonnegative and a) is nonnegative all a, 
will be nonnegative. 

In order to show f“** (0) is always positive it is easy to see from a 
simple division, by the power series of e-”’, that 


k+1) + s)(k 1)!m3***? 
+8+k + 2) (9) 


where the first factor is intrinsically nonnegative and the second is 
nonnegative whenever m, > 0, since n, k and s are taken always non- 
negative. It is at once obvious that a) = ¢(0) is necessarily nonnegative 
and therefore, in connection with (9), every a, is nonnegative. Re- 
turning to (3) it is now plain that ¢(¢) has the necessary and sufficient 
characters to be regarded as a characteristic function. We have, 
accordingly, obtained a generalization of Neyman’s contagious dis- 
tributions characterized by a critical parameter which we shall call 
0 <n < - and not necessarily an integer. For his original Type A, 
= 0, for B, n = 1 and for C, n = 


| 
| 
| 
| r+1 
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3. Calculation of the frequency distribution 
Following general procedure with characteristic functions, for any 
distribution, the probability of an observation of r events is 
P, =a, (10) 


which involve operations with e*', and this, for convenience, we shall 
replace by z. We turn from the characteristic function g(t) of (2) to 
the generating function 


¥(2) = exp {—m,} exp + 1) (11) 
or for n an integer 
Then 


P= (13) 


Setting in (12), n = 0, 1 or 2, successively we obtain the generating 
functions of Neyman (1939) for Types A, B and C, respectively. 
In calculating one finds from (5) and (13) 


P, en (14) 
Then from (10) and (8), conveniently 


(15) 


k=0 


where, we let 


(16) 


This recurrent relationship is that originally derived by Beall, as dis- 
cussed by Neyman (1939) in work on the latter’s Type A. Even the 
recurrent operation of (16) that involves us in — differentiation of 


i.e., 
f”0) = n!m k (s +- k)\(— — mz)" (18) 


slin+s+h)! 


| 

| 1 
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can, itself, however, prove difficult and tedious. Equation (18) is 
generally hard to evaluate so that it is worth while to point out three 
methods for its determination in any practical application. 

One method, that suggests itself, for finding successive values of 
f**” (0) is to note that 


k 


= mf"O +n — kin (19) 


as can be proven readily enough by replacing the expressions f by the 
general form of (18) and collecting on powers of (—m,). The successive 
values F, are readily obtained from (16). 

A second method for successive values of f““*” (0), which is the best in 
common practice, notes first that 


f(0) n! (—m,)’ 


s=0 (n + s)! (20) 
(—m,)" 


secondly that 


s(—m,)*~ 


(m, + n)f() —n 
and finally that for k > 1, from (18) by rearrangement, 


= (m+ 24+ Kf" O) — (22) 
From the foregoing and (16), values of F, are obtained, for example, 
= f'(0) (23) 

and fork > 1 
(24) 


The procedure just recommended, and in particular the use of 
(24), is a very rapid and convenient way of getting the necessary 
values F in (15) but does suffer from the defect in actual calculation 
that with successive steps rounding error in the earlier values, F, 
creeps towards the left of later values of F. There seems to be a ten- 
dency for the successive values, F, to lose almost a digit, so far as 
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accuracy is concerned, at each step so that even if f(0) is found with 
ten figure accuracy one can only go so high as to I’, or F\; . This 
tendency becomes more acute as n gets great. These shortcomings 
demand, for situations where the values of F, do not converge fairly 
rapidly, the development of yet a third method of finding the values 
of F, as is set forth immediately below. This third method is a little 
longer than that commonly recommended but is reliable to very high 
values of k. 

The recurrent coefficients may be conveniently calculated, for cases 
where n is an integer, by a method other than that previously employed. 
Consider the convenient quantities 


_ nlm; S (s + 
= k! st k! s=0 (n + s + k)!s! 
(k) 


where f,"’(0) is the value of f“ (0), in the sense of (19), for any given 
value of n. Let us consider the first negative forward difference, 


(25) 


n! m* > {s + (k + 1)}'(—m,)’ 


= (n-1) 
Plainly the second negative forward difference is 
= + Are 
n 
= in, — (n-1) 942} (27) 


n(n — 1) m3” (n—2) 

and the r-th negative forward difference is, by extension of such opera- 
tion, 


(28) 


A, = (n 


Accordingly, whatever the value of n (an integer) it is possible to obtain 
a sufficiently high negative forward difference so that there is involved 
= 
nl 
nim; 


~ 


| 

| 

| 
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These differences are easily calculated and are fairly rapidly convergent, 
so that the series of ,.0, can be conveniently built back. 

Let us not consider the obvious way of employing the previous 
results, i.e., by writing down a column of easily calculated negative 
forward differences, —A,, , neglecting all differences after they become 
vanishingly small, and building back. This procedure becomes heavy 
as m, gets great and the table must be made very long, in order to get 
reasonable accuracy. 

An alternate, less obvious but more practical way of employing 
the previous results involves accumulation down the columns so that 
the table may be made as short or as long as desired. It is necessary 
to invoke for k < 0, values 


= ¥ (s + k)\(—m,)’ 
(8 +k + n)!s! 


(30) 
_ nim, + 
! 
so that, conveniently, 
nO-1 = nm," 
= n(n — 1)m,* + nm;' (31) 


= n(n — 1)(m — 2)m;* + 2n(n — + nm;' 


etc. From the definition (30), the results of (26) to (29), of course 
apply. Accordingly, it is possible to build up a series of values ,6, 
for all k very conveniently, thus for instance, suppose we want to cal- 
culate .6, . We may construct Table I which is built downwards 
from these results. The values of f\"’(0) or of F, may again be found 
very simply from those for ,6; . 


4. The character of various contagious distributions, n an integer 

Let us consider the character of the generalized contagious dis- 
tribution. First we shall write from the characteristic function, (2) and 
from the rule that the r-th moment about origin of a distribution with 
characteristic function ¢(¢) is obtained from 


dt’ \t=0 (32) 


that 
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TABLE I. AN ILLUSTRATION FOR x = 2 OF THE DIFFERENCE METHOD 
FOR GETTING 


k 24, 
2 2 
m3 + Me 
2 
m3 
0 2!mz"e"™* 
1 These values 
2 may be e 
3 built up 
4 from the numerical a1 ~ 
ete. equivalents of those shown etc. 


wy = mm,/(n + 1) 
ares) 


_ 6m, 6m? 


from (21) we see that 


— = + 2) (34) 


so that the second moment, yu, , will always be greater than uw; , under 
our conditions that n and m, are positive. We see also that m, > 0 
from consideration of uj in (33). Also, us > ue. 

From (33), 


Mz = (n + 2)(u2 — 
m, = (n + 1)ui/m, 


(35) 


| 
| 
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and again, since generally n is a parameter, giving the type of distribu- 
tion, 


6(us + wine — wins — mi’) 
7 7 36 


The results, (35) and (36), will be later used in the practical estimation 
of m, , m, and n from empirical moments, along the lines of the pro- 
cedure of Neyman (1939) and Beall (1940). These results have, how- 
ever, other uses in the meantime. 

In certain connections, it is of interest to note that P, always de- 
creases as n increases from zero towards infinity, provided that m, and 
m, are chosen, as in the work of Neyman (1939), Beall (1940) and 
subsequent writers, so that uj and yu, are constant, i.e., identical with 
the empirical values. To see this point, note first from (14) that 


log Po = —m{1 — f,(0)} (37) 


—Ms 
(n +8 1)s! (38) 


and from (35) that 


d 
a = m,/(n + 1)(n + 2) 
(39) 
dm, _ 
dn = m,/ (n + 2) 
so that by simple but extensive operation 
d log Po _ me ™* (40) 


dn (n+ 1)n+2) 1)’ 


so that P, continually decreases as n increases, while m, and m, are 
positive, under which general condition we work. 
Let us consider the limiting form of the generalized distribution as 


no (41) 
We have from (35) 
m, = (42) 
and substituting (42) in (20) we have 
f.(0) =n! (_ (43) 
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In connection with the righthand member of (43), we may, as can be 
proven easily enough by simplification, note that, for the first k terms, 


nt @ +2)" 2 {1 4 = 9 


(n+s!\" 2) ~ 2+6¢ (n + 8)! 2 
n\(n + 2)* 
+h! \72 
hence, under (41), 
J.(0) 52 (45) 
for all c. 
Now from (35), under (41) 
m, uf (46) 
Accordingly 
Po = exp {m,{f,(0) — 1}} (47) 


2ut 
2 


Consider under (41), n — ~, now the general value of f{"(0) which 


is done most conveniently from the consideration of the quantities 
nO, of (25). Write from (26), 


and note that 
from (25) because the series involved is convergent for m, finite and 
Hl. (50) 


So from (48) and (49) we may write that under (41) and from (42), 


n 
(51) 
e 


so 


=> nOx (52) 
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But since from (45) and (25) 


(63) 
we can write that 
+ 2+¢ (54) 
and from (25) in conjunction with (16) under (41) 


We may note that (55) is convergent for c > 0; for the contagious 
distribution 
c 
0< <1 (56) 

Using (55), (47) and (46) in (15) it is very simple to get the frequency 
distribution as n — . The routine is discussed at some length in 
Section 7. 

It may be of interest to note the moments of the contagious dis- 
tribution under (41), as from (33), or in particular the relationship of 
the third moment to lower moments, i.e., 


38n+2, 
m= +04 S242 
2n+3 (57) 


ui(1 + 30 + 2) 
We may compare (57) with the situation obtaining for a binomial, i.e., 


Ms = wi(1 + 3c + 2c’) (58) 


where c has the meaning implicit in (42). The comparison of (57) 
with (58) may be of interest because it shows that even in the limit the 
new series of contagious distributions does not become identical with 
any binomial. The question might arise because obviously as from 
Fig. 1 and 2, the limit resembles at least superficially, say, a negative 
binomial much more than do.the early cases like n = 0 and n = 1. 
The question is of importance because Wadley (1950)* has pointed out 
that in many cases where a contagious distribution might have been 
anticipated, a negative binomial gives a better representation than at 
least the case of n = 0. His work might be very profitably extended to 


*While the present work was in press, Bliss, C. I. and R. A. Fisher, Biometrics 9: 176-200, have 
presented a further discussion in the same vein as that of Wadley. 
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a more general comparison of the negative binomial with contagious 
distributions. 


5. Applicability of the generalization in practice 


The foregoing generalization of Neyman’s contagious distributions 
will now be considered from the point of view of how well it agrees with 
field observation. Perhaps we should say cases will be considered 
where there is some reason to suppose that a better correspondence 
will be found between data and theory when n > 2, i.e., the case lies 
beyond Type C in Neyman’s terminology. A few other interesting 
cases will be considered. Only values of n an integer will be considered 
although intermediate, non-integer cases exist and are very similar. 
For all cases the primary data are presented. There are also shown 
fitted theoretical distributions for various low values of n and for 
n—» ©. Then values have been assumed but the necessary particular 
values of m, and m, have been found by the method of moments, as in 
(35). The possibility of the very interesting refinement to a maximum 
likelihood solution as presented by Shenton (1949) for the case of 
n = 0 has not been considered because it is extremely laborious even in 
this the most simple situation. It would be impossibly laborious for 
our fairly extensive exploration for various values of n. 

To begin, let us examine as in Table II and Fig. 1, the data of Beall 


TABLE II. OBSERVED FREQUENCY OF P. nubilalis IN 1936 (BEALL 1940) AND CONTA- 
GIOUS DISTRIBUTIONS WITH VARIOUS n. 


In- Type A|Type B/Type C 
sects | Obs. n=0 | n=1 n=2 | n=3 n=4 n=5 | n=6 n= n=8 | no 
0 24 27.0 24.5 23.4 22.8 22.4 22.2 22.0 21.9 21.7 20.8 
1 s\ 1.5 4.0 4.9 5.4 5.7 5.9 6.0 6.1 6.2 6.9 
2 4 3.0 4.1 4.6 4.9 5.0 5.1 5.2 5.3 5.3 5.7 
3 5 4.1 4.1 4.2 4.3 4.4 4.4 4.4 4.5 4.5 4.7 
4 3 4.3 3.8 3.7 3.7 3.7 3.7 3.7 3.7 3.7 3.8 
5 2 3.8 3.2 3.2 3.1 3.1 3.1 3.1 3.1 3.1 3.1 
6 1 3.0 2.8 2.7 2.6 2.6 2.5 2.5 2.5 2.5 2.4 
7 ‘\ 2.3 2.2 2.2 2.1 2.1 2.0 2.0 2.0 2.0 1.9 
8 2 L.7 1.8 1.7 i 1.6 1.6 1.6 1.6 1.6 1.5 
9 1 1.3 1.3 1.3 1.3 1.3 1.3 1.3 1.2 1.2 1.2 
10 2 1.0 1.0 1.0 1.0 1.0 1.0 1.0 9 9 9 
11 3 8 8 8 8 8 8 
12 1 6 6 6 6 6 6 6 5 5 6 
13+ 1 1.6 1.8 1.7 1.7 a7 1.8 1.8 2.0 2.2 1.8 
x? 12.80 5.49 4.81 4.60 4.62 4.62 4.69 5.71 6.23 6.21 
P,? 05 48 57 .60 60 60 58 .46 .40 .40 


a 
j 
— 
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AS IN TABLE II. 


In 
24 241 
4 
18; is 
n=O n=! 
12; 
6 6. 
3 9 12 3 
24 24 
is 18: 
n=2 
12 
3 6 9 12 6 12 
| 
is 18 
n=4 
12 12 
3 6 3 12 3 6 9 
FIG. 1. FREQUENCY OF OBSERVATIONS 0, 1, 2, ETC. WITH VARIOUS n FOR P. nubilalis, 


(1940) on the frequency of the European corn-borer, Pyrausta nubilalis 
Hubn., on small unit areas of a field as observed in 1936. It can be 
seen that for various values of n (the first and second moments constant) 
in the present case there is not exhibited the potential multimodality, 
apart from the bimodality that occurs for n = 0 and 1. The tail of 
the distributions is remarkably stable. Indeed, the main change that 
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occurs is a diminution of P, and a compensating augmentation of P; 
with some slight disturbance up to P,; or P,. The pace of such change 
is great for the early values of n but very small when n becomes sub- 
stantial. The change in x’ reflects, of course, the situation just dis- 
cussed, in that it changes rapidly at first and then increasingly slowly. 
Consideration of the value of x’, or better of its probability shown in the 
table, confirms our initial suspicion that the improvement apparent 
previous to the present investigation, as one goes from the case of 
n = 0 to the case of n = 2, continues, in some measure for higher n. 
In fact n = 3 ton = 5 gives the best fit. As m further increases the 
fit deteriorates slightly. 

The senior author (Beall 1940) presented comparable data for 
several years on P. nubilalis for which the situation is very much the 
same in each year, so that they have not been considered at such length 
as the data of 1936. Only for 1937 is the situation considered briefly, 
as in Table III. The results, like those for 1936, but in more extreme 
form, are typical of the situation that prompted the present investiga- 
tion, i.e., the correspondence of theory and observation seemed to 
improve markedly as n increased to 2; but 2 seemed not enough. The 
fits for n = 8 and n — o have been added and improve markedly to 
become quite good. Further cases were not studied because it appears 
that the case of n > © is probably the best. This conclusion is reached 
from the consideration of the contribution to x’ from various parts of 
the distribution. First, we can see that the theoretical P, drops always 
towards the observed proportion steadily and continuously {Equ. 
(40)} as” increases. Secondly, the theoretical P, rises steadily towards 
the observation. Thirdly, the contribution to x’ from the remaining 
classes P, , P; , etc., are very much the same for all n. It should be 
noted that here the results are to be contrasted with those for 1936 
where n was best short of @. 

One more example may perhaps be profitably considered from the 
work of Beall (1940), that of data on the potato beetle, Leptinotarsa 
decemlineata. Say, (Treatment 3 of the original paper) which had 
previously proven ill-fitted by n = 0, 1 and 2. In Table III n >= is 
added. The fit may be conceived improved but is still poor. As was 
discussed in the original paper, the potato beetles were at various stages 
of development and hence we may suppose that various contagious 
distributions were superimposed and our present simple theory is hardly 
applicable. 

It should be noted that in some of the work of Beall (1940), n = 0 
fitted well. We have picked cases mainly where n = 0 did not work. 
Thus in four series on Loxostege sticticalis L., n = 0 gave effectively a 


368 BIOMETRICS, SEPTEMBER 1953 


TABLE III. OBSERVED FREQUENCY OF P. nubilalis IN 1937 OR L. decemlineata (BEALL 
1940) AND CONTAGIOUS DISTRIBUTIONS WITH VARIOUS n. 


Pyrausta nubilalis 


Insects Obs. n = 0* n= 1* n = 2* n=8 no 
0 33 37.8 37.1 36.8 36.1 35.7 
1 12 5.6 6.8 7.3 8.3 8.8 
2 6 5.2 5.0 5.0 5.0 5.1 
3 3 3.5 3.2 3.1 2.9 2.9 
4 1 1.9 1.9 1.8 By 1.6 
5+ 1 2.0 20 2.0 2.0 1.9 
x? 9.04 5.57 4.47 2.90 2.17 
Px? .064 .24 .34 

Leptinotarsa decemlineata 

Insects Obs n = 0* n= n = 2* n—o© 
0 33 47.7 44.6 43.2 39.5 
1 12 4 27 3:7 6.0 
2 5 ie | 2.8 3.4 4.9 
3 6 2.0 2.8 3.2 3.9 
4 5 2.8 2.8 2.8 3.2 
5 0 3.2 2.6 2.5 2.5 
6 3 3.0 2.4 2.2 2.0 
7 2 2.5 24 1.9 1.6 
8 2 1.9 er f 1.6 1.3 
9 0 1.4 1.4 1.3 1.0 

10 1 1.0 1.0 1.0 8 

11+ 3.0 3.1 3.2 3.3 
x? 29.12 12.26 
Px? .000 ,061 06 


*Fit as in Beall (1940). 


better fit than n = 1 and this a better fit than n = 2. In each of these 
cases we note that P, was lower than the observed initial frequency. 
In accordance with (40) this tends to explain why higher n were not 
helpful. 
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Consideration of the change in character of the various contagious 
distributions in Table II, suggests that a case with some n > 0 may 
solve the problem that vexed Fracker & Brischle (1944). They worked 
with the frequency of plants, Ribes spp., in quadrats. They found 
that the case of n = 0 (Type A) was vastly better than the Poisson, 
but seemed to go too far and considered something intermediate between 
Type A and the Poisson. The higher members of the family as proposed 
in the present paper probably meet their problem. In illustration we 
have fitted for their case at Kaniksu, Idaho, for the n — © which does 
well as in Table IV. 

Consideration of the P, and P, indicates that no finite n would do 
better. As a second illustration from their work, there is presented data 
from Clearwater, Idaho, using their 8% sample (they used the data 
on the 4% sample). Again the case of n — o gives a fair fit. In this 
case we have calculated for n = 0, which is inferior. 

In Table V, there is considered the fit for » — © on the plant 
Lespedeza capitata as reported by Thomson (1952), it can be seen that 
there is some improvement over the situation for n = 0, although 
even for the latter the value of x’ would plainly be significant. We 
have added the maximum likelihood solution (following Shenton 1949) 
which seems to have repaid poorly all the trouble of its calculation. 
This is probably to be expected because from the nature of the solution 
it is directed towards the estimation of parameters and not at obtaining 
a theoretical distribution in agreement with observations. 

One of the most successful applications of Neyman’s contagious 
‘distributions was made by Archibald (1948) who studied the variation 
of various plants on small unit areas. This success doubtlessly occurred 
because the essential physical conditions underlying the variation in 
the plants were those originally considered by Neyman. There was 
involved an initial random distribution of mother plants from which 
there arose families of various size scattered about the site of the mother 
plant. For a number of species, Archibald fitted for n = 0, and this 
case only. For most species the fit was satisfactory but for one Sali- 
cornia stricta it was reported very poor (the x’ had a probability of 
.01) and we have considered this situation closely. It appears, however, 
in the first place, that some slip, on the part of Archibald, is involved, 
as is shown in Table VI, where x” = 18.67 and its probability is .18. 
We have considered n = 0 and 5 and — o. It appears that the fit 
becomes progressively better. It is probable that Archibald would also 
have gotten even better fits for at least some of the other species if 
she had used n > 0. This result is interesting because in this particular 
realm of botanical colonization, Thomas (1949) has suggested a dis- 
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TABLE IV. OBSERVED FREQUENCY OF Ribes (FRACKER & BRISCHLE 1944) AND 


CONTAGIOUS DISTRIBUTIONS 
KANIKSU, IDAHO (4% SAMPLE) 
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Ribes Obs. n = 0* no 
0 43 50.4 46.2 
1 15 5.9 11.9 
2 7.2 7.9 
3 6 6.0 5.1 
4 3 4.1 3.3 
5 4 2.5 2.1 
6 0 1.0 1.3 
7+ 1 3.1 2.2 
x? 17.58 3.45 
Px? .000 , 55 .33 

CLEARWATER, IDAHO (8% SAMPLE) 

Ribes Obs. n=0 
0 26 41.1 31.7 
1 9 a 4.6 
2 7 3 4.0 
3 3 7 3.5 
4 4 1.2 3.0 
5 2 1.9 2.6 
6 1 2.4 2.2 
7 3 2.7 1.9 
8 1 2.6 1.6 
9 0 2.3 1.4 
10 1 1.8 1.2 
11 0 1.4 1.0 
12 1 1.0 8 
13 1 8 4 
14 2 6 6 
15 1 5 5 
16 1 5 4 
17+ 1 2.1 2.3 
x? 68.23 7.24 
000 ,000 .066 


*Fit of Fracker and Brischle (1944). 


| 
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n=O 
6) ta 
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10, 
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FIG) 2. FREQUENCY OF OBSERVATIONS 0, 1, 2, ETC. WITH VARIOUS n for S. stricta, 
AS IN TABLE VI. 


‘ | 
| 
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TABLE V. OBSERVED FREQUENCY OF Lespedeza capitata (THOMSON 1952) AND 
CONTAGIOUS DISTRIBUTIONS. 


Max. Lik.* 
Plants Obs. n = 0* no n=0 
0 7178 7265.1 7217.6 7188.4 
1 286 134.0 218.6 219.6 
2 93 118.9 105.5 140.6 
3 40 71.1 50.9 61.6 
4 24 32.5 24.5 21.1 
5 7 12.3 11.8 6.2 
6 5 4.2 5.7 17 
7 1 1.3 2.8 5 
8 2 4 1.3 
9 1 | .0 
10 2 .0 3 .0 
ll 1 .2 
12+ 0 .0 2 .0 
*As calculated by Thomson. 


tribution only slightly different from the case of n = 0. In studies in 
the field it might be seriously considered whether Thomas’ distribution 
cannot be bettered by some member of the family of contagious dis- 
tributions. 

The distributions for n = 0 and 5 and for n > o, in the case of 
S. stricta, are shown in Fig. 2, because they illustrate rather boldly the 
change characteristic of increasing n. As in Fig. 1 for n = 0 there is 
bimodality which finally is replaced by a unimodal distribution. It is 
worthy of note, that in Fig. 2, the bimodality persists so late as n = 5. 

In contrast to the experience of Archibald, in that of Upholt and 
Craig (1940) contagious distributions failed to fit data on black scale 
(insects). The difficulty is apparently similar to that reported in Table 
III on L. decemlineata in that the actual machinery of dispersion of the 
insects is not close enough to that supposed fundamentally to exist. In 
the case of L. decemlineata, the failure was probably due to the super- 
imposition of larvae of various age with various distribution. In the 
case of the black scales, according to Upholt and Craig, the failure is 
probably due to the fact that one is dealing not with the distribution of 
daughters arising from an initial random dispersion of mother scales 
but with granddaughters, dispersed in various numbers from daughter 
scales themselves in a contagious distribution, and so on for several 
generations. Present endeavors support Upholt and Craig in failing to 
find a suitable contagious distribution; as can be seen in Table VII 
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TABLE VI. OBSERVED FREQUENCY OPS. stricta (ARCHIBALD 19 48) AND 
CONTAGIOUS DISTRIBUTION WITH VARIOUS n. 


Plants Obs n=0 n=5 
0 4 10.4 6.9 6.0 
1 3 4.0 6.7 6.9 
2 8 6.6 wie 8.0 
3 13 7.2 8.2 8.5 
4 11 8.2 8.3 8.6 
5 9 8.0 8.1 8.3 
6 8 7.8 
7 10 | 7.0 
8 3 6.4 6.3 6.3 
9 3 5.5 5.5 
10 8 5.0 4.8 4.7 
11 3 4.2 4.1 4.0 

12 4 3.5 3.4 3.3 
13 4 2.9 2.8 27 
14 0 2.4 2.3 2:2 
15 3 1.9 1.9 1.8 
16 0 1.5 1.5 1.4 
17 0 1.2 4.2 
18 1 9 1.0 9 
19+ 3 2.6 2.6 2.9 
x? 18.67 17.20 15.32 
Px? 18 25 .36 


where the fit for n = 2 for n — © are shown but no correspondence 
with the empirical distribution appears. 

The data of Marshall (1936) on frequency per plant of larvae of the 
American bollworm, Heliothis obsoleta F. on maize may be of interest 
because it has attracted considerable attention and because there 
exists a possibility that it is contagious. As explained by Marshall, the 
moth lays eggs singly. There seems, to the present authors, to exist, 
however, a possibility that a moth selects a plant randomly and then 
lays several eggs on that plant before moving on to another plant and 
that this group of eggs is somewhat analogous to the egg mass of the 
corn borer. Then we may expect there to be a random selection of 
sites on each of which a random number of eggs is laid. If such is the 
situation a contagious distribution should result. For these data, as in 
Table V, there is apparent no reason why any other value of n will 
better n — . For this n the value of x’ is better than that for n = 0, 
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TABLE VII. OBSERVED FREQUENCY OF BLACK SCALES (UPHOLT AND CRAIG 
1940) AND TWO CONTAGIOUS DISTRIBUTIONS. 


Scales Obs. n = 2* 
0 199 497.1 436.4 
1 124 16.0 31.4 
2 106 8.0 28.9 
3 65 5.2 26.7 
4 42 3.8 25.0 
5+ 285 290.9 272.6 


*As calculated by Upholt and Craig. 


TABLE VIII. OBSERVED FREQUENCY OF H. obsoleta (MARSHALL 1936) AND 
FIRST TWO CONTAGIOUS DISTRIBUTIONS AND THEN THE GREENWOOD- 
YULE DISTRIBUTION 


Insects Obs. n=0 Greenwood- 
Yule Dist.* 
0 206 267.3 226.2 200.4 
1 143 90.5 140.7 165.8 
2 128 105.3 113.9 123.8 
3 107 91.8 87.7 89.7 
4 71 70.3 65.0 62.9 
5 36 51.0 46.9 43.9 
6 32 36.0 33.0 30.4 
7 17 24.8 22.9 20.9 
8 14 16.5 15.6 14.3 
9 7 10.8 10.4 9.8 
10 7 6.9 6.9 6.7 
ll 2 4.3 4.6 4.5 
12 3 2.6 3.0 3.1 
13 3) 1.6 1.9 2.1 
14 1 9 1.2 1.4 
15 1 .6 8 1.0 
16 1 3 1.9 
17 2 2 3 
18 1 2 
19+ 0 2 3 
x? 68.92 17.99 13.58 
Px? .000 .082 . 257 


*As calculated by Walker (1942). 
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but not good. These data are apparently not in a contagious distribu- 
tion. On Marshall’s data, Walker (1942) made a study, from which 
we should note particularly the fit of the Greenwood-Yule distribution 
in Table VIII, which is fairly good. This distribution supposes that the 
larvae occur randomly in a small locality but that their probability 
varies over the field in a Pearson Type III distribution. 

In final illustration of the use of contagious distributions with high 
n it may be interesting to consider, as in Table IX, the data on yeast 


TABLE IX. OBSERVED FREQUENCY OF YEAST CELLS (NEYMAN 1939) AND TWO 
CONTAGIOUS DISTRIBUTIONS. 


Cells Obs. n = 0* no 
0 213 214.8 214.4 
1 128 121.3 122.1 
2 37 45.7 45.3 
3 18 13.7 13.5 
4 3 3.6 3.5 
5 1 8 8 
6+ 0 4 


*As calculated by Neyman. 


due to “Student” (1907) in Neyman’s (1939) original paper. He 
found that n = 0 fitted well but as can be seen n — © is at least as 
apt. This case is curious because it seems immaterial what value is 
given n. 

To conclude the present section we may make some observations on 
those cases where the contagious distributions might have been expected 
to apply and did so in fact, by the judicious choice of n. The similarity 
of the tails for all n indicates perhaps that our attention should be focused 
upon the first few members of the distribution. Among these members, 
P, is of transcendent importance. For n = 0 the theoretical value 
may be manifestly too great, whence we are advised by (40) to try 
n > 0, since we know this theoretical member will always diminish 
with increase in n. The behavior of P, for various n may be judged 
more fully from the various preceding tables. The value of P, may 
also be profitably considered since it seems generally to be greater for 
n = 1 than for n = O (and particularly greater if Py is much greater 
than P,) and this is another indication of the proper value of n. Our 
approach is similar to that of Thomas (1949), who fitted by making 
theoretical and observed class frequencies agree. We, however, are 
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inclined to do so only in connection with the estimation of n, while we 
should still get m, and m, by the method of moments. 

We may further make some general observations, on the goodness 
of fit for various values of n. From a fairly wide experience it seems 
that x” changes very rapidly when n is small but increasingly slowly 
as n becomes great. Thus in Table II, n = 8 and n — o gave us 
6.23 and 6.21, respectively, for values of x’. Hence if a high n is called 
for, we may have an unlimited choice of the n to be used. Since the 
case of n — o is the easiest to calculate, it would be chosen. 


6. The difficulty of explicit determination of n 


From the results of Section 5, it appears that contagious distributions 
may be fitted to many data, but there is perhaps a question as to the 
appropriate n. Some way should be found to get it empirically. The 
difficulty is the same as in Neyman’s original paper where there was 
no clear indication as to which of the three Types A, B or C, i.e., n = 0, 
1 or 2, should be chosen. The writers have considered the possibility 
of getting n by the method of moments (as for m, and m,) but the 
outcome has been unhappy. 

Let us first consider the estimation of m, , m, and n simultaneously 
from (35) and (36), for the data on P. nubilalis, in 1936, of Table I. 
We find pi = 2.982,142,857, u2 = 14.946,109,69 and uw, = 42.619,254,54 
80 that we estimate n = —1.95 (m, = —25.61 and m, = +.11). It 
should be noted that n is negative. In violent contrast, Table II 
suggests that, at least from the point of view of getting a minimal ’, 
under the condition that m, and m, yield the first two empirical moments, 
n should be about 4. For the data on P. nubilalis, in 1937, the moments 
are wy = .821,428,571,4, uw. = 2.182,397,959 and uw; = 10.746,264,58 
and on estimating by the method of moments we get n = —4.38 
(m, = 1.41 and m, = —1.97), again negative. Similarly for P. nubilalis, 
of 1938, of the same series, we find from the empirical moments, 
n = +.24 which we would call too small, since it is close to Neyman’s 
Type A which Beall (1940) found inferior to Type C. So we may say 
that again n is erring in the negative direction. 

Let us consider some more data on P. nubilalis Hubn., from Beall 
(1940, Table II, Treatment 1). From the empirical moments in (62), 
n = 1.95. It appears however, that following our procedure in Table II, 
etc., some more extreme case (n > 2) would give a better fit. If such 
is the case our estimate of n is low. Similarly for the data on P. nubilalis, 
of Beall (1940, Table II, Treatment 4), n = —4.40. The method of 
moments again seems no wise satisfactory. 


{ 
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Generally speaking, the foregoing work suggests that (36) with 
empirical moments, substituted, gives estimates of n too small and 
certainly bad. There is, of course, little reason why values of the 
parameters, so obtained, should give very good correspondence between 
the theoretical and empirical distributions, although the method does 
work in some connections. Further, it would not seem that the third 
moment could be highly sensitive to changes in n. This can be seen 
for instance from Table II where there is a fairly extensive exploration of 
the variation in the contagious distributions as n varies. Obviously 
the third moment varies little with change in n, i.e., the variation is 
mostly in the values of P, , P,; and a few early members while the tail 
of the distribution is little changed. Such has tended to be our experience 
with data generally. Under these circumstances (36) becomes very 
sensitive to chance fluctuations and n becomes more or less random. 
The suggestion that n seems to be underestimated seems hardly worthy 
of further exploration. ; 

It seems that in practice one should choose as in Section 5, the value 
of n from a consideration of the proportion of cases in the first class, 
i.e., the proportion of observations 0, and then find m, and m, by the 
method of moments. In the use of the proportion of cases in the first 
class, our procedure is not unlike that of Thomas (1949) who used this 
as one consideration for finding the values of m, and m, . 


7. Illustrative application of procedure 


In order to make abundantly clear the actual procedure of fitting 
the generalized contagious distributions, with any value of n, there 
follow a few simple illustrations. The data are those of Table II, on 
P. nubilalis for 1936. The principal illustration is that of fitting for 
n = 2 (Type C). 

In the first place it should be clearly realized that while the theory 
rests on certain characteristic functions, in particular on Equ. (2), and 
on the associated generating functions and while these involve awkward 
operations of differentiation these are not employed in the routine of 
calculation. This is done by the use of certain recurrent functions, as in 
Section 3. We shall consider the procedure hinging on Equ. (22) and 
previously designated as that commonly recommended. The total 
operation, discussed in great detail below, is directed toward the cal- 
culation of the successive probabilities P, . In order to obtain these 
we must first calculate the numerical coefficients F, . Generally, 
however, these coefficients must themselves be calculated from an 
antecedent recurrent operation. 


a 
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We must first estimate the values m, and m, from Equ. (35). If, for 
example, we choose n = 2, we have accordingly ’. 


(us — ui) _ 14.946,109,693 — 2.982,142,857 | 
2.982,142,857 


= 8.023,738,236 
_ 3(2.982,142,857) 
 8.023,738,236 


Parenthetically, we may note that the second moment should be cal- 
culated by the formula 


1 


= 1.114,995,069 


1 one 
In = y D(X (60) 
i.e., the divisor is N not N — 1 as is done in estimating standard devia- 
tions in certain connections. Note also in all these calculations it is 
necessary to use a surprisingly large number of places. 
Towards the recurrent coefficients F, find, repeating (20), 
For the illustrative case of n = 2, 


SO 2(—8.023 1 + 8.023 ,738,236) 


(62) 
= .218,205,184,4 
The second step is to calculate from (21) and (16) 
F, = (m, + n)f@) —n 
= (10.023,738,236)(.218,205,184,4) — 2 
= .187,231,651 (63) 
The third step is to calculate 
F, = {(m, +n)’ + n}f(0) — n{(m, +n) + 1} 
= .313,171,436 (64) 


Now using these two first values, it is necessary to compute for all higher 
P, the comparatively simple Table X. Numerically, the procedure is 
as in Table XI. Column (2), for a given k, is the product of a value in 
Column (1) and the value of m, , F in the preceding row all divided by k, 
ie., the value in column (2) when k = 3 is [(13.023,738,236) (.424, 
197,564) /3] = 1.841,546,010. Column (3), for a given k, is the product 
of m, and a value of m,F in the second preceding row divided by k — 1, 
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TABLE X. THE CALCULATION OF RECURRENT COEFFICIENTS FOR THE - 
GENERAL CASE. 

m n+k —m 
0 Me + n m Fo 
mtn+1 mF, 

m n+2 
2|m+n+2 + + mF, mF mF, 

m n+3 —m 
3 m,+n+3 + mF, — mF mF; 

TABLE XI. NUMERICAL ILLUSTRATION FOR P. nubilalis 1936 OF GETTING THE 
RECURRENT COEFFICIENTS, FOR n = 2. 
(1) (2) (3) (4) 
m 2+k 
k m+t+2+k 2 m,F,-2 
k k-1 
X mF;,-; 

0 . 208 , 762 ,368 
1 .349 , 184,607 
2 12.023,738,24 | 2.099,252,155| —1.675,054,591 .424, 197 ,564 
3 13.023 ,738,24 | 1.841,546,010) —1.400,882,941 .440 ,663 ,069 
4 14.023,738,24 | 1.544,935,884| —1.134,550,071 .410 ,385 ,813 
5 15.023,738,24 | 1.233,105,807} — .883,941,280 .349 , 164,527 
6 16.023 ,738 , 24 .932,486,774| — .658,565,668 . 273 ,921, 106 
7 17.023 ,738 , 24 .666 , 166,023} — .466,934,127 . 199,231,896 
8 18.023 ,738, 24 .448 ,862,943) — .313,981,672 . 134,881,271 
9 19.023 , 738 , 24 .285,105,111) — .199,823,073 .085 , 282 ,038 
10 20.023 , 738 , 24 .170,766,520) — .120,250,223 .050 ,516 , 297 
1l 21.023 , 738,24 .096 549,218) — .068,428,075 .028, 121,143 
12 22.023 , 738,24 .051,611,059) — .036,848,140 .014, 762,919 
13 23 .023 , 738 , 24 .026,145,968} — .018,803,058 .007 ,342 ,910 
14 24.023 ,738 ,24 .012,600,297} — .009,111,831 .003 , 488 , 466 


i.e., the value in column (3) when k = 3 is [(—8.023,738,236) (.349,184, 
607)/2] = —1.400,882,941. Column (4) is the sum of the items of the 


same line but in columns (2) and (3). 
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In order to get finally the probabilities P, for the case of n = 2 
from (14) 


P, = exp {m,{f,(0) — 1}} 
exp {1.114,995,069(.218,205,184,4 — 1)} 


= .418,24 (65) 


Then all the succeeding values P, , P, , etc., are calculated, as in Table 
XII when the m,F values are those proper for n = 2, as from Table XI. 


TABLE XII. THE CALCULATION OF SUCCESSIVE PROBABILITIES P,; FOR ANY 
VALUE OF n. 


Tr P, 


as in Equ. (65) 

P; = (mF o)Po 

P2 = {(miFo)Pi + (mF 1)Po} /2 

Ps = {(miFo)P2 + + (mF 2)Po}/3 

= o)Ps + (mF 1)P2 + (mF 2)Ps + (miF3)Po}/4 


© 


TABLE XIII. NUMERICAL ILLUSTRATION FOR P. nubilalis OF GETTING FOR n = 2 
THE SUCCESSIVE PROBABILITIES, P,. 


P, 


, 24 

-087,31 = (.208,762)(.418,24) 

.082,14 = {(.349,185)(.418,24) + (.208,762)(.087,31)}/2 
-075,02 = {(.424,198)(.418,24) + (.349, 185)(.087,31) 
.066 , 42 { (.208,762)(.082, 14) } /3 


.047,50 


11 .014,03 ete. 


v 
12 .010 ,63 
13 .008 ,02 
14 .006 ,01 
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In practice it is convenient to rewrite those coefficients so that the last 
becomes first and the first last and then apply the inverted series 
m,F, to the column of P, as it develops. To get each new value of P, 
the inverted column is then slid down one step. The results are in 
Table XIII. 

For any other value of n, excepting n = 0 and n > o, the recurrent 
coefficients are calculated in essentially the same way as for n = 2. 
In Equ. (59), (60), (61), (63) and (64) and in Table X, the appropriate 
value of n must be used. The remaining references for P etc. are 
identical with those for the case of n = 2. 

The case of n = 0 is peculiar in that the recurrent coefficients F, , 
can be found rather simply. Note that in general from (16) and (18), 


—ms 
(66) 


Then 
F, = (67) 


and F, = m,F,. The subsequent, successive values of F, , Fs; , etc., 
may be found very simply on the general principle illustrated by the 
operation, F, = m,F,/2, F; = m,F,/3, etc. Then the successive 
values of P, should be calculated as in Table XII, employing from (14), 


P, = exp {—m,} exp {me""} (68) 


The calculation for the case of n — © is numerically, particularly 
simple. There is no value m, , but rather the matter can be most 
conveniently arranged about the numbers: 


c= — 4.011,869,118 (69) 
1 
and 
c 4.011,869,118 
2+c  6.011,869,118 (70) 
and also 
= = 1.486,660,092 (71) 


The calculation of the values m,F, is very simple. First we calculate 


(2 + ¢)’ 


mF, = 


i 
| _ 
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and then successively, 


mF, = mF, , mF, = 


2+¢c 
4 

mF; = 32+¢ mF, ’ etc. 


For the numerical illustration of P. nubilalis, 1936, of Table II, 


= .330,04) 
mF, = .440,49 | (72) 
mF, = .440,93 
mF, = .392,32) 
Finally, by direct calculation, 
P, = exp (73) 


The other P, values, i.e., P; , P2 , Ps , etc., are again calculated from 
Table XII. 

It cannot be denied that the calculation of contagious distributions 
is a little involved, although the difficulties are minimal for n = 0 and 
n— . There has appeared no practical solution by way of the forma- 
tion of facilitating tables. Such has been attempted by Thomson 
(1952) for the case of nm = 0 but little seems to have been gained in 
this, one of the two most simple cases. 


8. Summary and discussion 


Neyman (1939) proposed a class of contagious distributions appli- 
cable to situations where individuals or items are supposed initially 
dispersed in randomly scattered groups—egg masses in the case of 
insects or clumps in the case of bacteria—which are subject to a chance 
graduation in size. We have obtained a generalized contagious dis- 
tribution, characterized by a critical parameter which we call0 <n < @, 
and not necessarily an integer. Neyman’s original types are cases 
n = 0,1 or 2. 

There existed a considerable problem in calculating the generalized 
contagious distributions, which the senior author had previously solved 
in part by the development of a recurrent method for finding the 
probabilities P, of successive observations 0, 1,2 --- r---+. This is 
discussed by Neyman (1939:47). While this approach made it physically 
possible to get the distributions the work remained heavy, except for 
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n = 0. The problem is that the recurrent coefficients may themselves 
be difficult to find. The writers, accordingly, have developed two 
schemes for finding these coefficients by secondary recurrent methods. 
One of these methods is particularly simple in that only two previous 
members of a series of coefficients are needed to calculate the next. 
They have also developed a scheme for building up the coefficients by a 
method of differences, for n an integer, which is particularly accurate 
and useful in some connections. These methods make it almost as 
easy to calculate a contagious distribution as a binomial. 

The form of the generalization as n > © is naturally of interest. It 
resembles somewhat a negative binomial, although not identical. Very 
frequently it fits data very well. In the cases we have examined n 
increasing beyond say 10 makes very little difference and perhaps the 
case for n — © may often be used, particularly, since it is very easily 
calculated. 

Alternate forms (n varying) of generalized distribution were tried 
against various empirical distributions. The value of m was chosen 
arbitrarily and the other parameters were found by the method of 
_ moments. The various examples were chosen from the literature; in 
particular cases, where n = 0 had been found to fit poorly, were con- 
sidered. Cases exist where n = 0 gave the best fit, although from our 
choice of material this class was ill explored. There was found only 
one case where 0 < n < © gave the best fit—theoretically this class 
may be fairly frequent. There were several cases where n — © gave 
the best fit. There was one case where the magnitude of n seemed 
immaterial. In some cases where quite a good fit had been obtained 
with n = 0, it was bettered by n > 0. In two cases where the known 
details of insect distribution in no wise corresponded to those pre- 
supposed in a Neyman type distribution (random distribution of groups 
of random magnitude) the fit was bad. 

There exists a considerable problem as to the best means of deciding 
the value of n, characterizing the type of distribution. The method of 
moments seems a priori and a posteriori hopeless. The best method 
seems to be that of setting n such that P, , the proportion of zero ob- 
servations, agrees with the empirical result while the other two param- 
eters m, and m, are chosen so that the theoretical and empirical moments 
agree. 

The present study is necessarily restricted to the defined purpose of 
making a fairly obvious extension of the class of contagious distributions 
proposed by Neyman (1939). This has been accomplished and the 
generalized family, not unexpectedly, can better represent empirical 
data than the former part of the family. The utility of the generaliza- 


j 
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tion, in dealing with biological data will be the same as that of Neyman’s 
previous types and this he has discussed at considerable length in the 
concluding remarks of his original paper of 1939. 

The present results will- permit explorations to be made for the ex- 
istence of phenomena that may be graduated by contagious distributions. 
This may be useful in trying to determine the machinery of dispersion. 
Experience with contagious distributions may also be useful in the 
analysis of experimental results; such was the intention of the whole 
field of investigation in the first place. It may be noted that the trans- 
formation of Beall (1942) from observations z to 


= sinh”! (qzx)'”? (74) 


may have some applicability to contagious distributions since it is 
based on a presumption that 


Be = + qui? (75) 


where g is some constant and this situation exists for the contagious 
distributions, as for the negative binomial. It is, however, probable 
that the transformation is not so apt for the contagious distribution 
as for the negative binomial because although in the former case the 
standard deviation may be made independent of the mean it can 
hardly be supposed that, in general, the distribution is normalized, 
as seems to be approximately the case for the binomial. The whole 
problem and the role, however, of such transformations is very ill 
understood and we cannot be final. 

The writers are beholden to Professor Jerzy Neyman for his advice 
at the beginning of the present work. 
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Appendix—An alternate generalization of the contagious distr.butions. 


The generalization (2) of Neyman’s contagious distributions is 
fairly obviously suggested by the form of Equ. (1); it may however be 
also shown to arise by an extension of Neyman’s operations in setting 
up his types A, B and C. We may start with the general characteristic 
function of a contagious distribution in the form of Neyman’s Equ. (26), 


it 
log = — 1) + Am ) (76) 
n=2 
where P,, are moments of a random variable Z {which Neyman has 
identified with his P (£, 7)} the probability that a given individual 
caterpillar etc., will be found within an experimental plot. There is an 
important condition that always P,; = A~'. Neyman then introduces 
the function F(z) where z is any number between zero and unity such 
that F(z) will possess all the properties of the integral probability law 
of Z. 

; Neyman has presupposed certain conditions concerning p(z) 
dF /dz: first, that it exists and secondly that it is either zero or a function 
of z (for Type A, Z isa constant, A~'). Then, as various approximations 
to F, he writes: 


for Type A p.(z) = Ofor0<<z< A',A>1 ) 
for Type B p,(z) = 3A forO0 <z << 2A",A>2 
= 0, elsewhere (= (77) 


2 
for Type C = < <84",42>2 


="0, elsewhere 


a 
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Equ. (77) suggests that one can write, generally, 


p,(2) = 1)A 2} 
(78) 


p-(z) = 0, elsewhere 


For illustration consider 


3 
pPa(z) = (16A~* — 2”) for0 <2 4A", A>4 


(79) 
= 0, elsewhere 
The form of (78) satisfies the two requirements on F(z). 
The moments of p,(z), as from (78), are 
Ir! 
niri{(r + 1)A~"} (80) 


(n+r)! 


whence on substitution of (80) in (76) we get the general characteristic 
function 


log = — 1) + Am — + 1)A7*}" 


n=2 (n+ 71)! 


~where m, = Am and m, = [(r + 1)A]/A. Edqu. (81) is identical with 
the general characteristic function that we assumed in the beginning 
of the paper, i.e., (2). 


QUALITY X QUANTITY INTERACTION 
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INTRODUCTION 


estimation of the effect of qualify and of the quality X quantity inter- 
action in the case of an experiment with 4 qualities of a fertiliser at 3 
equally spaced levels, the lowest of which is not zero. 

As the originator of this query, my object was to obtain information 
concerning the general case of a situation common in experimentation. 
Textbooks are not very illuminating, since they all deal with the same 
special case, that of a number of qualities at 3 equally spaced levels, 
the lowest of which is zero, so that dummy treatments occur at the 
zero level. Thus Fisher (3) shows that the ordinary subdivision (based 
on a hypothesis of independent additive effects of the factors) of the 
sum of squares of such an interaction table into sums of squares for 
main effects, dummy effects, and interaction is unsatisfactory, and 
proposes the more natural hypothesis that quality differences are 
proportional to quantity applied instead of constant for all quantities. 
He describes his estimation of quality effects as “calculating the ‘re- 
gression’ of the manurial response upon the manurial difference to 
which it is for the present purpose to be considered as proportional”. 
An alternative orthogonal subdivision of the sums of squares for quality 
and for interaction of quality and quantity (as obtained in an ordinary 
interaction table) is based on the identity 


+ + — = + + — 2y)’, 


where the left hand side of the identity represents the ordinary sub- 
division and the right hand side the subdivision proposed by Fisher. 
From the first term on the right hand side is derived the sum of squares 
for quality on the hypothesis of proportional response. The residue, 
which would be zero if the response to quality was exactly proportional 
to quantity (since x — 2y would be constant for each quality), is 
labelled “interaction” of quality and quantity. The tendency might 
be to regard the significance of this interaction as demonstrating the 
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Fisher (4), in a reply to a ya to this journal, has discussed the 
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fact that “the qualities are varying in their effect at different levels’’. 
_A little thought, however, shows that such variation, if present, is 
actually an effect of quality itself. As Fisher puts it: “The quantitative 
contrasts are differences caused by quantitative variations in the very 
substances which the qualitative comparisons are intended to compare”. 
The ordinary type of statement used in interpreting an interaction 
effect is therefore not applicable. Later (p. 148) Fisher supplies the 
type of interpretation necessary. Fisher also mentions the applicability 
of his method of subdivision to the case when the ratio of the two 
quantities used is different from 1:2. Yates (7) restates Fisher’s findings 
without discussion, except to show how the dummy treatments may be 
used to eliminate block differences in calculating the sums of squares 
for the interaction of quality and quality X quantity with the third 
factor of a3 X 3 X 3 confounded design. Cochran and Cox (2) discuss 
the subdivision of the sum of squares for an identical type of inter- 
action table on the same hypothesis. The estimate of quality effects 
is shown to be actually a weighted mean. They also derive Fisher’s 
interaction of quality and quantity as “differences in curvature’. 
The effect of quality is labelled ‘differences in linear response’. This 
change of nomenclature has the effect of making it easier to see how the 
analysis could be carried out if there were more than two quantitative 
levels in addition to the zero level. Nevertheless none of the above 
accounts does actually indicate how the methods are to be extended 
when the number of non-zero levels exceeds two, or when there is no 
zero level. . 
Kempthorne (5), under the heading ‘Partially factorial experi- 
ments”, discusses a 3 X 3 Quality X quantity table, the quantities 
being equally spaced. He states that, if none of the amounts is zero, 
there is no new difficulty. Taking the usual example, with the lowest 
level zero, he shows how to fit three quadratic response curves with 
common base point, there being no restrictions on the seven constants 
fitted. He also discusses a model in which the basic response law is 
uw + fx + gz’, the qualities differing in containing different concen- 
trations of the basic causative factor, so that an amount x of the first 
quality is considered equivalent to quantities Az and rz of the second 
and third qualities. It is noted that, when the response law is assumed 
to be linear, the situation corresponds to that described by Yates (and 
Fisher). However, certain difficulties with this model are mentioned, 
and it is concluded that “satisfactory methods for the analysis of such 
data have not been evolved for more or less general situations”’. 
Williams (6) discusses a more general problem, proposing a model 
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in which the effects of one factor (quality) are considered to be pro- 
portional at different levels of the other factor (quantity). The pro- 
portions are, however, not simply those of the quantities applied. 
Williams states that the simpler hypothesis, if realistic, is to be preferred, 
but that in many experiments involving qualities and quantities the 
effects will not be proportional to quantity but will increase less rapidly 
than quantity. His proposed method of analysis therefore includes 
the estimation of these proportions from the data in such a way that 
the sum of squares for the weighted main effects of qualities (c.f. 
Cochran and Cox) is a maximum and the residual interaction sum of 
squares a minimum. However, since the method includes the finding 
of latent roots of a matrix, it is not likely to commend itself to the 
average experimentalist who so often must perforce do his own analyses. 

The more complex type of analysis such as that proposed by Williams 
seems to be more suitable when it is the nature of the response curves 
of the different qualities which is the main subject of investigation. 
In practice it will more often be the case that the experimentalist 
merely wishes to estimate the yields of his various treatments, compari- 
sons being made by difference, supported by statistical tests of signifi- 
eance if possible. As Kempthorne points out, this can always be done 
irrespective of any knowledge of the response curves, whether assumed 
a priori or presumed from the data. Suitable subdivision of the treat- 
ments set of degrees of freedom will, however, enable the nature of the 
treatment differences to be more clearly brought out. Even the ordinary 
additive hypothesis will achieve this to some extent, for, if the inter- 
action proves to be significant, then it will be realised that main effects 
based on this hypothesis are of little value and attention is directed 
back to individual treatment means. On the other hand the experi- 
mentalist is then at liberty to try some other hypothesis (the simpler 
the better) which may explain the nature of the. observed effects. 
Fisher, Yates, and Cochran and Cox therefore test the hypothesis that 
the response to quality is proportional to quantity applied as being more 
natural than the hypothesis of equal additive effects in these circum- 
stances. When these simple hypotheses have been exhausted, there 
is certainly no compulsion to undertake any more complex analysis 
such as those suggested by Kempthorne or Williams. Furthermore 
there is no doubt that an analysis based on the proportional hypothesis, 
whether realistic or not, provides the answer to very realistic questions, 
viz:—(1) “What is the average response to each unit application of 
each quality of fertiliser? (2) Does this average response vary from 
quality to quality? (3) Is the response per unit application of a quality 
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consistent over the range of applications of that quality?”. These 
questions are in general more realistic than those posed under the 
additive hypothesis, viz:—(1) ‘“‘What is the average yield of each 
quality over its various applications? (2) Do these averages differ? 
(3) Are differences between qualities constant over the different applica- 
tions?” It would seem to me that, provided one is not interested in 
the exact nature of the response curves, the answers to the first three 
questions, coupled with an appropriate examination of the inter- 
action table, will provide an adequate routine analysis in this situation. 

The subject of this paper concerns, therefore, only the analysis of 
quality X quantity interaction tables when the proportional hypothesis 
is adopted. Although Fisher (4), while stating that opinions may 
legitimately differ, has demonstrated the method to be followed in a 
particular case, it is possible that the ordinary experimentalist who 
does not claim to be a professional biometrician may be glad to have a 
little more meat to chew on than the rather bare bones of an algebraic 
identity! The statistical situation is actually identical with that known 
in biological assay as a “‘slope-ratio assay’’, as described, for example, 
by Bliss (1), except that there the zero dose is usually represented. 
There is, however, a difference in experimental objective, and in addition 
many workers familiar with the analysis of field experiments may be 
unacquainted with the literature of biological assay. 


STATISTICAL MODEL FOR AN ORDINARY INTERACTION TABLE 


Suppose we consider the case of 3 quantities (x = 1, 2, 3), none of 
which is zero, and for simplicity the case of 2 qualities only. This 
latter simplification does not impair the generality of what follows 
except in respect of minor details. The yields are set out in an inter- 
action table as under, and without loss of generality we shall treat 
them as derived from a single replication. 


Quantities 
Qualities mm Totals 
1 2 3 
A yu Yiz Yis Yio 
B. Ya Y22 Yas Y20 
Totals Yo Yor Yos y 
Differences by1 dys 
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The ordinary procedure for partitioning the sum of squares for this 
interaction table may be summarised as 


TOTAL SS. QUALITIES QUANTITIES 
1 1 
INTERACTION 


This partitioning implies the assumption that with no interaction the 
differences dy; would be the same apart from random fluctuations. 

In practice the sum of squares for quantities will usually be sub- 
divided into linear and quadratic effects, with corresponding subdivision 
of the interaction sum of squares. If we assume for the meantime that 
responses are purely linear, the subdivision is equivalent to postulation 
of the following model (ignoring random components), 


Quantities 
Qualities 
1 2 3 
A — by mi + by (Model 1) 
B be me + bo 


where m, and m, , b, and b, represent the mean levels of yield and the 
linear regression coefficients, respectively, for the 2 qualities. We may 
replace m, and m, by m — q and m + q indicating deviations from a 
common mean level brought about by the effect of quality. Similarly, 
we may replace b, and b, by b — 6b ard b + 4b, where b indicates a com- 
mon regression coefficient and 6b is a correction (or interaction compo- 


nent) to allow for the possibility that b, and b, may differ. The model 
now is: 


Quantities 
Qualities 


Means 
1 2 3 


A m —q — (b — m—q | m—q + (b — m —q | (Model 
B m+q — (b+ 5)| m+q |m+q-+ (b+ m+q 2) 


Means m—b m m+b 
Differences 2q — 25b 2q 2q + 25b 


S38 
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Thus m and q are estimated from the mean and mean difference, respec- 
tively, while b and 6b may be estimated by applying the appropriate 
orthogonal polynomial values (—1, 0, 1) to the means and differences 
respectively, for the 3 quantities. The procedure is seen to be equivalent 
to fitting individual linear regression lines to the data supplied by each 
of the 2 qualities at 3 levels and to fitting a common regression line to 
the data for both qualities combined. The sums of squares correspond- 
ing to b and 6b represent, respectively, the sum of squares due to fitting 
the common regression line and a sum of squares representing deviations 
of the slopes of the individual regression lines from that of the common 
regression line. 


STATISTICAL MODEL ON THE ASSUMPTION OF PROPORTIONAL RESPONSE 


In Model (2), if 5b = 0 (i.e. if there is no interaction), the differences 
between qualities for the 3 levels are all equal to 2q, so that the effect 
of the quality difference is assumed the same for all levels of the fertiliser. 
However, Fisher (3) has pointed out that a more natural hypothesis 
would be to assume that the response to quality is proportional to 
quantity applied, in which case these differences should be, in the 
present example, in the ratio 1:2:3. To meet this, we now propose an 
alternative model:— 


Quantities 
Qualities 
1 2 3 
A m+ by m + 2b; m + 3h 
B m 4- be m + 2b. m a 3b. (Model 3) 
Differences Ab 2Ab 3Ab 


Fitting the constants m, b, , and b, to a set of data would be equivalent 
to fitting two linear regression lines to the two sets of data supplied 
by the two qualities, but with the restriction that the vertical differences 
between the lines at z = 1, 2, 3 must be in the ratio 1:2:3, or, in other 
words, that the lines must have a common point at z = 0. 

The observational equations under the hypothesis of proportional 
response, with all other deviations considered random, are therefore:— 


| 
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m 


1 


6 14 Yu + 
6 14 Yar + 2Y22 + 
Solving these, we have 
14(b, + b.) = Yor + 2Vo2 + — 12m, 
whence 42m — 36m + + + 3¥o3) = 7Y 


= {Yur + + + + 2¥o2 + os) — 7Y} 


b, 


{Yor + + 3y23 + 3( Yor + 2¥o2 + 3¥ 03) — 7Y} 
The quality difference, Ab = b, — b, , is therefore estimated as 

1 

14 (dy: + 25y2 + 3dys). 


The fitting of these constants (or these two linear regression lines) 
absorbs 2 degrees of freedom, if we exclude that for m, which is merely 
equivalent to a change of origin similar to that for the correction factor 
in an ordinary sum of squares. There remains a third degree of freedom 
available from the previous model, which represents differences between 
the lines fitted on the basis of the present model and the best fitting 
lines provided by Model 2; in other words it represents deviations from 


1 Yis 
1 1 You 
1 ad 2 j Yoo 
The normal equations are:— 
6 66 6] ~ | 
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the ratio 1:2:3 of quality differences for the different quantity levels. 
This is the so-called “quality quantity interaction”. 

An additional constant may be introduced to represent such devia- 
tions. Taking the first quality only, we may propose the amended 
model 


m+ b, + kid, m + 2b, + m + 3b, + 


where k, , k. , k; are fixed for a given interaction table, and are deter- 
mined as follows:— 


(1) There must be no component of d, in the expression used to 
estimate quality differences, viz. + + 


Hence k, + 2k, + 3k; = 0. 
(2) Since we are still fitting linear regression lines, we must have 
k, —k, = ks — ky 
These equations yield the integral solution 
k=1, k= —2, 


and the model becomes (with d, and d, constrained to sum to zero): 


Quantities 
Qualities 
1 2 3 
A m + by + 4d; | m + 2b; + d; | m + 3b; — 2d; 
B m + be + | m + 2b. + dp m + 3b, — (Model 4) 


For this model the normal equations are:— 


db, ba d, d, 


6 6 6 . -|= Y 

6 14 Yu + 2yi2 + 

6 14 Yor + + 3y23 

3 21 : + Yi2 — 
Ayer + Y22 — 
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The values for m, b, , and b, are unaltered. The remainder of the solution 
is 


1 
d, = 21 (4411 + Yi2 — 2s — 3m) 


1 
d, = 21 (4y21 + — 2Y20 — 3m). 


The sum of d, and d, is zero, as required. 
The linear functions having detached coefficients (4,1, — 2), ortho- 
gonal to (1,2,3), correspond to the subdivision proposed by Fisher (4). 


NUMERICAL ILLUSTRATIONS 
Example (1). 

The following table shows the yields of a single cut from an experi- 
ment on natural pasture. Ammonium sulphate (S) was applied at 


300, 600, and 900 lbs. per acre and Ammonium nitrate (N) at equivalent 
rates. 


(Quantities 
Qualities Totals | (1) + 2(2) | 4(1) + (2) 
1 2 3 + 3(3) — 2(8) 
Ss 197.61 218.04 223.97 639.62 | 1,305.60 560.54 
N 186.20 215.63 217.86 619.69 | 1,271.04 524.71 
Totals 383.81 |: 433.67 441.83 | 1,259.31 | 2,576.64 | 1,085.25 


(a) Using Model 3:— 


6m = 1,259.31 X 7 — 2,576.64 X 3 = 1,085.25, 


‘ whence m = 180.88 


(= total of last column) 


b, = 7 (1,305.60 — 1,085.25) = 15.74 


b, = — (1,271.04 — 1,085.25) = 13.27. 


| 
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Hence the fitted values are:— 
Quantities 
Qualities 
1 2 3 
S 196.62 212.36 228.10 
N 194.15 207 .42 220.69 
Differences 2.47 4.94 7.41 


The differences are in the ratio 1:2:3 as required. 
(b) Using Model 4:— 
We find 


on ai (560.54 — 542.62) = 0.85 


d, = xi (524.71 — 542.62) = —0.85, 


m, b, , and b, being as before. The fitted values are then:— 


Quantities 
Qualities 
1 2 3 
Ss 200 .02 213.21 226.40 
N 190.75 206 .57 222.39 
(c) Using Model 2:— 
We find 
m = 209.88 
q = 3(619.69 — 639.62) = —3.32 


b = 14.50 
6b = 1.32. 


It may be confirmed that apart from rounding errors the fitted values 
obtained from this model are identical with those obtained by using 
Model 4. 
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QUANTITY OF FERTILISER RAPPLIEO 


These fitted values have been plotted in the diagram. The effect 
of the quality difference is revealed in the comparison of the two con- 
tinuous lines, firstly by their width apart and secondly by the angle 
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between them. These two criteria are obviously not independent, 
however, since an increase of the distance between the lines at z = 1 
must increase the angle between the lines if the ratio 1:2:3 of vertical 
distances between the lines at « = 1, 2, 3 is maintained. This is clear 
from the fact that in testing the significance of the quality difference, 
we are testing the difference between b, and b, of Model 3, and this 
difference not only measures the difference in slopes of the two regression 
lines, but also the distance between them. 

¥rom the diagram it appears that the broken lines are somewhat 
dissimilar to the continuous lines, providing evidence that the hypo- 
thesis of proportional response to quality is not obeyed as closely as it 
might be. 

Hence it may be stated:— 


1. The test of significance of the quality difference tests the significance 
of the difference in position and slope of 2 linear regression lines fitted 
to the data for the 2 qualities on the assumption of proportional re- 
sponse to quality. : 

2. The test of significance of the quality X quantity interaction tests 
the validity of the assumption that qualitative responses are propor- 
tional to quantity applied. Should this interaction prove significant, 
the assumption of proportional response can no longer be considered 
valid and the previous test and estimates of qualitative effects will 
become meaningless. On the other hand, irrespective of the significance 
or otherwise of the test for effect of quality, the significance of the inter- 
action would be clear evidence of some kind of qualitative effect. 


In the numerical example the yields were totals of 12 plots, so that 
the sums of squares for the various effects are 


(1,305.60 — 1,271.04)’ 


8.8. Quality = 2x 12 = 3.5547 
(560.54 — 524.71)? _ 
8.8. Quality X quantity = 2.5472, 


each with one degree of freedom. (In practice a row of differences 
(S — N) would be added to the interaction table). 

These two sums of squares must add to the same as the sums of 
squares for quality and differences of linear effects appropriate to 
Model 2, namely to 


75 (639.62 — 619.69)? + i (197.61 — 186.20 — 223.97 + 217.86)’ 


= 5.5167 + 0.5852, 
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checking the computations. It would thus appear that in this example 
the ordinary additive hypothesis is nearer the truth, since it gives the 
higher sum of squares for quality, but in actual fact all of the above 
mean squares were less than the error mean square of the experiment. 
The illustration is therefore not a good one in respect of demonstrating 
experimental agreement with the proportional hypothesis. It was 
chosen so that the points and lines of the diagram would be distinct, 
and also because the ratio of quantities applied corresponds to that 
of the theoretical discussion, which in turn discusses the particular case 
dealt with by Fisher. In many examples the points and lines are so 
close to one another that a clear diagram would be impossible. In any 
case it does sometimes happen that qualitative differences observed 
at the lowest level keep more or less constant at the other levels, es- 
pecially when there is comparatively little response to the additional 
applications. (See Example 3). 


Example2. Yields of maize (in lb.) are given for superphosphate and 
rockphosphate each applied at the rates 100, 300, and 500 lb. per acre. 


Quantities 
Qualities Totals | (1) +3(2)} 138(1) + 
1 2 3 + 5(8) |4(2) — 5(3) 
S 84.73 85.05 89.09 258.87 785.33 996.24 
R 83.45 82.00 82.93 248 .38 744.10 998.20 
S+R 168.18 167.05 172.02 507.25 1,529.43 | 1,994.44 
S-R 1.28 3.05 6.16 10.49 41.23 —1.96 


In this example Model 4 needs to be amended to suit the 1:3:5 
ratio of quantities. Thus, here k, = 13, k, = 4, ks = —5, and the 
normal equations are: 


mb bb d, 

9 = [ 507.25) 
9 35 785.33 
9 35 744.10 
12 210 996.24 

12 210|  [998.20, 


| 
| 
| 
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The solution is m = 83.102, b, = 1.069, b, = —0.109, d, = —0.005 
d, = 0.005. The interaction components are very small, and the fitted 
values 


Quantities 
Qualities 
1 2 3 
Ss 84.11 86.12 88.68 
R 83.05 82.96 82.33 
are close to the observed yields. 
Sums of squares are computed as follows:— 
_ (41.28)? x 

Quality X quantity _. p04, 

2 X 210 X } 


since each yield in the table is a mean of 4 plots. The error mean 
square of the experiment was 19.38 with 56 D.F., and so the quality 
difference is significant at the 1% level. The average response to 100 
lb. superphosphate was 1.069 lb. per plot, whereas 100 lb. of rock 
phosphate depressed the yield by 0.109 Ib. on the average. The mean 
response to 100 lb. of phosphate (average of both qualities) is 0.480 
lb. per plot, which corresponds to the average response given by b in 
Model 2—in this example 3(172.02 — 168.18) = 0.480. This is a 
consequence of the fact that the common regression line of Model 2 
is an average for both pairs of regression lines (distinguished by con- 
tinuous and broken lines in the diagram). 

Standard errors of these estimates may be found by expressing 
b, , b, , etc. as linear functions of the yields in the interaction table. 
Thus 


9 
35b, = Yu + + 5Yi3 = 24 (13 Yo, + 4Yo2. — 5) os); 


b, = 8x 35 (—3lyi: — + 12y12 — 12y22 + + 15y2s), 


Le. 
so that 
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Var. (b;) = (31° + 39? + --- + 15’)o/4, 


where o” is the variance of a single plot, 
= 0.019200”. 


Hence §.E. of 6, or b, = 0.01920 X 19.38 = 0.610. 
Variance of average response, 3(b, + b.), 


1 o 
= 64 X 2X S.E. = 0.389. 


Variance of difference of average responses, b, — b2 , 


1 o 
= 35 x2xX 43 S.E. = 0.526. 


QUADRATIC AND HIGHER EFFECTS 


So far the problem has been considered on the basis of purely linear 
responses to quantity, but in actual fact the regressions may be non- 
linear. In the general case, if the overall regression (i.e. for all qualities 
combined) is significantly non-linear, then there will be curvature of 
the regression lines for at least some of the qualities. These curvatures 
may be similar or different. 

The fitting of constants to represent curvature effects may be 
simply done by the use of orthogonal polynomials. Since the degrees 
of freedom representing differences of curvature will then be orthogonal 
to those representing linear effects and differences of linear effects, 
they will also be orthogonal to those representing quality and inter- 
action of quality and quantity as defined above. 

In the numerical example we may therefore propose constants to 
represent quadratic effects, whereupon the model will become:— 


Quantity 
Quality 
1 2 3 
A m+b+4d+c m + 2b, +d — 2c, m+ 3b,+2d+c 
B m + be — 4d + | m + 2b. — d — 2c2| m + 3b, + 2d + 


Here a single constant d has replaced the two constants d, and d, 
(dq, + d, = 0). In the analysis the 2 degrees of freedom for c, and c, 


a 
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will be presented as one degree of freedom representing an overall 
quadratic effect (the mean of c, and c,), and a second degree of freedom 
representing the difference between c, and c, . 

Should neither of these degrees of freedom prove to be significant, 
the analysis based on the assumption of purely linear regressions will 
be valid, but should either test of significance prove significant, this 
result will supersede previous findings based on linearity alone. In 
the case of a significant difference between the quadratic effects there 
will also be evidence of differential qualitative effects. The same 
applies to cubic and higher effects in the general case. 

If we estimate c, and c, in the numerical example by applying the 
orthogonal polynomial values (1, — 2, 1) to the yields for each quality, 
it will be found that the fitted values correspond exactly to the observed 
values, since a quadratic curve can be inade to pass through any set of 
3 points. The sums of squares for average and difference of quadratic 
effects in the two examples given earlier are:— 


Example 1: 


(383.81 — 2 X 433.67 + 441.83)’ 
6 X 24 


§.S. Average quadratic effect = 


= 12.0756 
S.S Difference of quadratic effects (or Quadratic effect < quality) 


_ (197.61—2X218.04+4-223.97 — 186.2042 X 215.63 — 217.86)” 
144 


1.1201 


Example 2: 


(168.18 — 2 X 167.05 + 172.02)’ 


S.S. Average quadratic effect 6X} 


12.40 


= 5 
8.8. Difference of quadratic effects = O28 2 X38 + 6.16 


3 


= 0.60 


‘ 
: 
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Hence in neither example is there any significant evidence of curvature, 
though the reason for the relatively high sum of squares for average 
curvature in Example 1 is apparent from the diagram. 


DUMMY TREATMENTS 


If a zero quantity is included, it is usual to include one such plot 
for each quality in the experiment. This enables the overall linear, 
quadratic, etc. effects to be computed without difficulty by the use of 
ordinary orthogonal polynomial values, since the observations at each 
level are of equal weight. It will also probably be necessary from the 
point of view of arranging a suitable confounded design. In this case, 
provided that none of the totals in the interaction table is subject to 
block effects, differences between the treatment totals for the zero 
application will be due to experimental error. A sum of squares due 
to “dummies” is therefore separated out for inclusion in the error sum 
of squares. 

Occasionally in experimental work more than one set of control 
or zero-application treatments may have to be included. It may not be 
inappropriate therefore to consider the subdivision of sums of squares 
for the interaction table in a general case. Let P and Q be 2 factors 


occurring at p and q levels, respectively, and let the interaction table 
of yields be as follows:— 


Level of Level of Factor Q Totals 
Factor P 1 q 
1 yn Yin tie 
2 Ya Y2q Yx 
Totals Ya Ya Yog 


The y;,; are for simplicity assumed to be from a single replication. 
P and Q are factors of such type that, if a zero application of Q occurs, 
then automatically it is associated with a zero application of P. 


| 
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Let us suppose that the first r levels of factor Q, but none of factor 
P, are zero or control levels. Then we may compute the following sums 


of squares:— 


S... All treatment combinations = yi; — A, 


with pg — 1 D.F. 


SS. Factor Q = 5 Yi, B, with — 1 DF. 


B may be subdivided into 


j=r+l1 


S.S. Controls v. Remainder = - = 
=C, with1 DF. 


S.S. Controls among themselves 
- 4( =D, withr—1D-F. 
i=l i=1 


S.S. Remainder among themselves 


ThuB=C+D+E8. 
8.8. Factor P (ignoring the fact that some levels of Q are zero levels) 


= with p — 1 DF. 


The correct 8.8. for for dummies) 


j=rtl 
with p — 1 D.F. 


S.S. for Interaction PQ (ignoring dummies) 
=H=A-—B-F,_ with(p— I(q-—1) DF. 


_ The correct 8.8. for Interaction ny (allowing for dummies) 


g=rt+l 


with@ — 1) DF. 
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i.e. it is computed in the ordinary way from a truncated interaction 
table in which the control treatments are excluded. 

In addition there is a sum of squares representing differences of 
control plots for the various levels of P 


i=l P i=1 
The sums of squares J and D represent dummy comparisons and must 
be included in error. 

It can readily be seen that not only is A equal to B + F + H, 
representing the ordinary subdivision for an interaction table, but also 
tooB+G+I+J. 

Consider now the very common case where Q is a quantitative 
factor with lowest level zero, and P is a factor representing different 
qualities of Q. With only a single control level, D does not exist. A 
possible method of subdivision of the sum of squares of such an inter- 
action table would be into B + G + I + J, but with B preferably 
subdivided into polynomial effects (linear, quadratic, etc.) rather than 
into C + E, and with G + I alternatively subdivided into sums of 
squares for Quality, Quality X quantity, Quality < quadratic effect, 
etc. as indicated earlier. With minor differences this is actually the 
type of analysis suggested by Bliss (1) for slope-ratio assays, except 
that the sum of squares for Quality X quantity (labelled ‘“non-con- 
vergence at zero dose”) is there obtained by a subtraction method. 
However, such a subdivision is in general not completely satisfactory, 
being practically equivalent to treating the zero quantitative level 
as a qualitative control, except for the fact that the average effect 
of quantities is estimated over a wider range by treating the control 
as a zero level. By this method we would not be considering the zero 
level when estimating quadratic and higher degree regression effects of 
the individual qualities. Occasionally (for example, when there is a big 
response to the first level of application followed by moderate responses 
to subsequent levels), this may be an advantage, since a polynomial 
curve might be unsuitable as a representation of the data over the 
whole range of applications, but adequate over the part of the range 
excluding zero. 

The above consideration seems to lie behind the change of nomen- 
clature introduced by Cochran and Cox whereby the sums of squares 
in a 3 X 3 table with a zero level, which Fisher calls “Quality” and 
“Quality X quantity”, are renamed “Differences in linear response” 
and ‘Differences in curvature’. The extension to more than three 
levels is thus suggested and will be illustrated by example. In both the 


at 
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experiments used earlier there were actually zero levels which were 
previously suppressed in order to provide the type of example required. 
Example 1: The yields of the two zero levels were 123.26 and 109.88. 


2 
SS. Dummies = -123:26 = 7.4593 


8.8. Linear effect of nitrogen (N’) 


_ (—3 X 233.14 — 383.81 + 433.67 + 3 « 441.83)? 
ws 20 X 24 


= 951.8362 
8.8. Quadratic effect of nitrogen (N”’) 


_ (233.14 — 383.81 — 433.67 + 441.83)? 
7 4X 24 


= 211.5531 
S.S. Cubic effect of nitrogen (N’”) 


_ (—233.14 + 3 X 383.81 — 3 X 433.67 + 441.83)" 
20 X 24 


= 7.2792 


The regression effects above are obtained as usual by applying the 
appropriate orthogonal polynomial values to the quantity totals. 

It is now proposed to subdivide the sum of squares G + I with 
3 D.F. by applying to the differences S — N at the three non-zero 
levels the set of coefficients 


1 2 3 
—1 1 (1) 
3 1 
instead of the set 
1 2 3 
4 1 —2 (2) 
1 —2 1 


It will be noticed that, whereas set (2) is completely orthogonal, set 
(1) is not, because the second and third rows consist of the quadratic 
and cubic orthogonal polynomials for 4 equally spaced levels less the 
first coefficient, which cancels owing to the common zero level. Thus 
the sum of squares for “Differences in linear effect’’ (or Quality) is 
orthogonal to those for “Differences in quadratic effect” and ‘‘Differ- 
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ences in cubic effect”’, but the last two are not orthogonal to one another. 
It would be possible either to combine these into a joint sum of squares 
for interaction with 2 D.F., obtained by subtraction, or to compute 
“Differences in quadratic effect ignoring differences in cubic (and, in 
general, higher) effects”, leaving a residual interaction sum of squares. 
The latter course is probably preferable, since effects higher than 
quadratic, and consequently differences of these, are likely to be negli- 
gible. There is, however, no independent check on the computations 
in either of these cases. 


Hence we compute as follows:— 


8.8. Differences in linear effect = 3.5547 (as before) 
8.8. Differences in quadratic effect ignoring differences in cubic effect 


_ (=11.41 — 2.41 + 6.11)” 
3 X< 24 


Residue (differences in cubic effect eliminating differences in quadratic 
effect) = 2.8418. 

Only N’ and N” are significant in this experiment, and the signifi- 
cance of N” shows that the average response to 300 lb. of nitrogen 
was not consistent over the range studied. The sum of squares G + I 
was, in fact, not as great as the error mean square and so in practice 
no subdivision would be necessary. 

Example 2: The quantities are in the ratio 0:1:3:5, and only 4 plots 
with a mean yield of 78.20 were available at the zero level; there is 
thus no sum of squares for “‘Dummies”. The computations are simpli- 
fied by using the appropriate weighted orthogonal polynomial co- 
efficients. These are (to be applied to marginal totals) :— 


= 0.8256 


Linear 3 17 (Divisor 1,162) 
Quadratic 608 —108 —544 348 (Divisor 1,227 ,072) 
Cubic --30,720 | 28,800 | —19,200 | 5,760 | (Divisor 3,406, 233,600) 


The divisors refer to a single entry in the interaction table, and must 
actually be divided by 4, since each entry is the mean of 4 plots. 


S.S. Linear effect of phosphate (P’) 


_ (—18 X 78.20 — 11 X 168.18 + 3 X 167.05 + 17 X 172.02)? 
1,162 X } 


2 
= 97.05. 
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Similarly, 8.8. P’”’ 8.66. 
8.8. P’” = 59.32. 
The sum, P’ + P” + P’”, checks to the S.S. for Quantities, 165.04. 


S.S. Differences in linear effect = 97.14 (as before). 
S.S. Differences in quadratic effect ignoring differences in cubic 
effect 


_ (—108 X 1.28 — 544 X 3.05 + 348 X 6.16)’ 
(108 + 544° + 348”) X 2x} 


= 0.56. 
Residue = 0.08. 


In this experiment it is clear that the two qualities of phosphate have 
given significantly different average responses to 100 lbs. applied. 
There is no evidence of any variation in these responses; for P’”’, 
although sizeable, is not significant. 

Although the sum of squares for differences of linear effects (or 
quality) is the same as before, the estimates of the two average responses, 
b, and b, , are different in consideration of the additional information 
provided by the zero level. Amending the previous notation for the 
yields of the interaction table by the addition of yoo for the yield at zero 
level (which will now be included in the total, Y), we find the normal 
equations for m, b, , and b, to be 


m ob 

7 9 =| 585.45 
9 35 . 785.33 
9 35, 744.10, 


whence m = 81.035, b, = 1.600, b, = 0.422. The difference b, — b. 
is, of course, the same as before and has the same S.E. To find the 
S.E. of b, or b, we have 


b, = + 3412 + 5yis + (Yo. + 3Yo2 + 5Y¥ 03) — 


1 
= 35. X 83 (—315yo0 15ly, 234y21 177y:2 72y 


+ 505y:3 + 90y23). 
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Hence Var. (b,) = 0.014110’, 
S.E. = 0.523. 


Variance of average response, 3(1.600 + 0.422) = 1.011, which is the 
same thing as (7/1162) (167.91), 


~ 1162 ° 4’ 
S.E. = 0.452. 


Example 3. The following interaction table is from an experiment on 
wattle with 5 levels of phosphatic fertilizer (nil, 200, 400, 600, 800 Ibs. 
per acre), the plots of which were split for the comparison of 2 different 
types of phosphate. The figures represent mean heights per sub-plot 
at one year totalled over 9 sub-plots. 


Quantities 
Qualities Totals 
0 1 2 3 4 
A 67.1 77.9 Chat 77.5 78.5 378.7 
B 67.9 83.1 83.0 84.2 84.5 402.7 
Totals 135.0 161.0 160.7 161.7 163.0 781.4 
Differences 0.8 5.2 5.3 6.7 6.0 24.0 


A subdivision of the sum of squares for quantities by means of 
orthogonal polynomials resulted in the following analysis of variance:— 


Source of variation D.F. Sum of squares M.S. F. 
Linear 1 17.861 17.861 30.8** 
Quadratic 1 9.181 9.181 15.9" 
Cubic and quartic 2 4.580 2.290 4:0* 
Total 4 31.622 
Error (a) 14 0.5791 


The significance of the residual mean square shows that the routine 
procedure of fitting a second degree curve is inadequate. Inspection 
of the data reveals very little response to phosphate after the first 
200 lb., and so we subdivide alternatively as follows:— 
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Source of variation D.F. 8.8. MSS. 
Phosphate v. no phosphate 1 31.447 
Between phosphate levels 3 0.175 
Total 4 31.622 
Error (a) 14 0.5791 


It is evident that the variation between the non-zero quantities is 
small compared with experimental error. In this example, therefore, 
by way of an exception, it seems preferable to keep the zero level apart 
from the others, subdividing the 2 X 4 table excluding the zero level 
according to the method discussed earlier in the paper. This results 
as follows:— 


Source of variation D.F 8.8. M.S. F, 
Types of phosphate 1 6.645 6.645 32.9°%* 
Quality X quantity 1 0.871 0.871 4.3* 
Quality X quadratic effect 1 0.009 
Quality X cubic effect 1 0.032 

Total 4 7.557 

Error (b) 25 0.2022 


The significant quality X quantity interaction shows that the pro- 
portional hypothesis is not obeyed here, and an inspection of the quality 
differences shows that the simple additive hypothesis will probably be 
closer to the mark, as might reasonably be expected in view of the lack 
of response to additional applications of phosphate. 

The new subdivision on this hypothesis is:— 


Source of variation D.F. S.S. MSS. F. 
Types of phosphate 1 7.476 7.476 37.0°* 
Quality X quantity 3 0.081 

Total 4 7.557 

Error (b) 25 0.2022 
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Hence in this experiment an adequate measure of the qualitative 
difference is provided by an unweighted mean of the quality differences 
at the different quantitative levels. 
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SUMMARY 


A method proposed by Fisher of estimating the effects of quality 
and the interaction of quality and quantity in experiments containing 
a number of qualities of a fertiliser at a number of quantitative levels 
is discussed, and the interpretation of these estimates from the ex- 
perimental point of view is considered. 
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STRAWBERRY UNIFORMITY YIELD TRIALS’ 


G. A. BAKER AND R. E. Baker 
University of California, Davis, California 


Realistic and comprehensive mathematical models for the analyses 
of field trial data can be based only on an extensive and intensive study 
of uniformity trials. The collection of uniformity trial data is a time 
and money consuming project. It cannot be assumed that the models 
found adequate for one crop in one year at one place can be extrapolated 
with any degree of confidence to other quite different crops in very 
different environments. This is a well recognized problem and several 
workers (1), (2), (3), (4), (5), (6), (7), (8), and (9) have published 
recent results. 

The purpose of the present paper is to present two uniformity yield 
trials for strawberries. The variety grown was the same in the two 
cases but the environments were somewhat different for the different 
trials. Some calculations are made and presented including random 
sampling results in order to emphasize differences between the trials 
and point to possible arrangements of field trials that will lead to 
improved clarity of interpretation. 


DATA 


The data consist of individual plant yields in grams for two uni- 
formity trials. One consisted of 200 plants of the Tahoe variety grown 
in two double-row beds at Davis in 1946. The beds of two rows, which 
were one foot apart, were spaced 42 inches apart with 10 inches between 
plants in the rows. The rows consisted of 50 plants. These data are 
presented in Table 1. 


1This paper was read at a joint meeting of the Institute of Mathematical Statistics and the Western 
North American Region of The Biometric Society held June 15-16, 1951, at the Rand Corporation, 
Santa Monica, California, and an abstract appeared in Biometrics 7(3):300. 
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TABLE 1. TWO HUNDRED INDIVIDUAL PLANT YIELDS IN GRAMS 
DAVIS, 1946 
Double Bed No. 1 Double Bed No. 2 
176 145 141 139 
235 176 207 138 
217 96 155 167 
234 205 211 218 
239 207 261 158 
216 212 197 173 
318 138 295 152 
190 136 161 191 
181 145 228 296 
225 166 300 160 
266 317 274 163 
297 143 249 245 
226 228 211 268 
283 245 235 252 
219 167 236 159 
267 157 140 275 
265 144 125 185 
216 190 67 203 
130 215 273 132 
148 191 184 234 
174 212 213 199 
246 207 278 276 
198 228 181 242 
194 213 263 159 
260 174 228 193 
245 87 142 241 
207 154 269 211 
203 212 287 243 
260 155 290 176 
204 160 226 272 
136 87 228 270 
243 129 213 352 
325 251 165 256 
106 295 208 220 
222 185 254 277 
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TABLE 1—Continued 


Double Bed No. 1 Double Bed No. 2 


The second uniformity trial consisted of 500 plants of the Tahoe 
variety grown in single beds at Davis in 1948. The beds were 30 inches 
apart with 10 inches between plants. The rows were 50 plants long. 
The irrigation water was somewhat saline which may have increased 
the variability of plant yields. These data are presented in table 2. 

The mean yield in grams for the two trials were 208 and 173 re- 
spectively. The ratio of the standard deviation to the mean was 0.26 
for the 200-plant trial and 0.39 for the 500-plant trial. 


RANDOM SAMPLING RESULTS 


Various sizes and arrangements of plots were imposed on both of the 
uniformity trials and “varieties” assigned to the elementary plots at 
random. Conventional F-values were calculated and the results are 
given in tables 3 and 4. F-values using both residual error (internal 
variance) and replication z variety interaction are given simply to 
call attention to differences between the two trials and the various 
arrangement of plots. i 

It should be noted that in general with q varieties and p replications 
(q!)” arrangements are possible. Since in the present case if we have a 
particular set up we can rearrange the “variety” designations in q! 
ways and get the same F-value. It follows that there are (q!)”"* distinct 
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228 235 161 295 
124 139 242 239 
288 97 280 273 
; 240 194 237 159 
181 175 185 244 
231 95 193 250 
193 105 160 261 
144 163 204 252 
208 271 228 185 
191 207 226 182 
166 117 128 195 
216 152 298 280 
282 148 293 196 
232 151 267 199 
. 284 150 278 183 
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TABLE 2. FIVE HUNDRED INDIVIDUAL PLANT YIELDS IN GRAMS, DAVIS, 1948 
SINGLE BEDS 30 INCHES APART 


129 203 126 136 141 81 88 144 225 80 
72 95 71 249 109 47 240 121 167 71 


124 | 130 | 106 88 | 197 | 154 | 160 | 142 | 233 20 
95 94 87 57 97 | 156 | 201 | 284 | 230 | 124 
100 | 126 73 94 | 150 | 163 | 109 | 143 | 170 | 175 
63 71 | 153 93 | 173 | 181 | 324 | 197 | 203 | 171 
76 | 136 | 138 39 74 | 229 | 192 | 275 | 198 | 115 
153 46 92 55 | 117 | 106 | 221 | 207 | 245 | 143 . 
88 | 149 | 122 96 | 186 | 160 | 311 | 206 | 285 | 175 . 
58 | 116 | 149 | 123 | 136 | 226 | 373 | 319 | 286 | 211 
72 | 114 | 142 67 94 | 253 | 201 | 249 | 154 39 
178 | 215 | 197 | 106 | 160 | 229 | 192 | 188 | 206 | 177 
194 | 175 | 182 | 147 | 137 | 225 | 193 | 157 | 151 | 260 
193 | 220 | 131 | 123 | 182 | 208 | 293 | 326 | 158 | 256 : 
205 | 180 | 117 | 145 | 240 | 402 | 339 | 127 | 232 | 186 
212 | 234 | 127 | 114 | 258 | 241 | 168 | 244 | 220 | 194 
206 | 100 | 183 | 216 | 234 | 212 | 210 | 197 | 188 | 137 
171 | 169 | 106 | 250 | 190 | 161 | 204 | 132 | 279 | 201 
119 | 228 | 231 | 149 | 163 | 230 | 299 | 220 | 130 | 159 
155 | 269 | 110 92 | 316 | 192 | 299 | 214 | 287 94 
155 | 127 | 223 | 102 | 251 | 126 | 155 | 216 | 150 | 170 
251 | 122 | 320 | 338 | 317 | 107 | 270 | 116 | 294 | 172 
275 | 117 | 209 | 136 | 191 | 153 | 179 | 132 | 293 90 
278 91 | 194 | 175 | 122 | 265 | 208 | 105 | 199 | 265 
324 | 128 | 203 | 210 | 211 | 145 | 233 | 196 | 230 | 132 
190 | 221 | 240 | 129 82 | 253 | 144 | 148 76 | 268 
134 | 243 | 251 | 171 | 208 97 | 115 | 262 | 231 | 193 
189 | 295 76 | 108 | 168 | 248 | 304 | 170 | 155 | 112 
141 | 97 | 108 | 169 | 156 | 189 63 | 239 | 21) 43 
180 | 286 | 148 | 142 | 156 | 112 | 131 | 166 79 80 5 
175 | 265 | 201 193 | 220 | 114 | 212 97 | 118 59 r €. 
238 | 317 | 204 | 184 | 224 | 216 87 | 193 | 194 87 
272 | 18 | 208 | 128 | 178 | 266 | 135 | 209 | 124 74 
206 | 194 | 272 | 123 | 197 | 182 97 | 111 128 | 132 
78 | 293 | 120 | 173 | 151 | 1385 | 261 57 59 | 224 
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TABLE 2—Continued 


values available in the parent population of possible F-values. In 
most cases this number is large in comparison to the 100 values obtained 
by random sampling. 

In addition to the randomized block set-ups of tables 3 and 4 some 
latin squares were tried on the 500-plant uniformity data. For 50 
random assignments for a 10 x 10 latin square two F’s were significant 
at the 1% level and five at the 5% level. The mean was 1.26 and 
variance 0.36. For 50 random assignments for a 5 x 5 latin square 
one F was significant at the 1% level and seven at the 5% level. The 
mean was 1.60 and variance 2.07. The F’s for the latin squares were 
calculated using interaction as the error term and not the variances 
due to differences between plants within primary plots. The latin 
squares seem to find significant differences about two or three times as 
often as expected. 

Real variety differences ranging from — 20 to + 20 grams per plant 
and differing by 4 grams, except that the two center “varieties” differ 
by 8 grams, were imposed on two plot arrangements of table 4. The 
mean yield for table 2 is 173.4 grams, and thus the magnitudes of the 
superimposed real differences had a range of about 23 per cent of the 
general mean. The variance of these real differences was twice as large 
as needed for a probability of 0.8 in detecting the falsehood of the 
null hypothesis at the 1% level. The results are given in table 5. 
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58 | 136 | 220 | 169 | 135 | 179 | 259 | 199 | 201 | 180 
146 | 344 | 120 | 222 | 201 | 144 | 155 | 187 | 215 | 239 
171 | 126 | 148 | 289 | 242 | 149 | 133 | 243 | 289 | 102 
321 | 225 | 96 | 230 | 128 | 72 | 375 | 171 | 299 | 247 
275 | 333 | 166 | 219 | 113 | 187 | 217 | 122 | 191 | 105 
f 108 | 249 | 128 | 139 | 228 | 56 | 78 | 212 | 119 | 90 
53 | 128 | 234 | 173 | 163 | 164 | 203 | 55 | 183 | 195 
128 | 225 | 190 | 107 | 189 | 246 | 194 | 178 | 149 | 232 
88 | 185 | 317 | 116 | 246 | 211 79 | 180 | 213 44 
206 | 215 | 137 | 222 | 251 | 255 | 285 | 231 | 172 | 135 
104 | 171 | 134 | 83 | 172 | 185 | 194 | 172 | 315 60 
112 | 104 | 289 | 172 | 119 | 191 | 243 | 196 | 124 | 96 
52 | 151 | 306 | 209 | 207 | 113 | 189 | 209 | 215 | 21 
68 | 117 | 198 | 202 | 136 | 118 | 241 | 172 | 133 | 162 
82 | 144 | 127 | 172 | 220 | 163 | 213 | 151 | 137 | 7 
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Other sets of real differences, without end, could be imposed. The 
ones chosen give an uniform spread at a level that should be easily 
detected if we were dealing with situations adequately described by 
classical models. 


CORRELATIONS BETWEEN THE YIELDS OF PLANTS AT DIFFERENT DISTANCES 


Wiebe (9) has calculated the correlation coefficients between yields 
of rod-rows of wheat planted one foot apart for various distances between 
rows. His correlations start at about 0.8 and taper off linearly with 
distance being still about 0.4 at 30 feet between rows. For the straw- 
berry trials the closest plants have a correlation of less than 0.30 which 
is barely significant at the 5% level. This low correlation disappears 
very quickly with increasing distance between plants. These results 
are confirmed by the data on peanuts reported by Robinson, Rigney, 
and Harvey (5). 


DISCUSSION 


Certain aspects of two uniformity trials based on individual plant 
yields are presented and compared. Many of the random sampling 
results conform to the current textbook procedures of treating yield 
trials as far as saying differences exist when in fact they do not exist. 
The 200-plant trial shows much better conformity in some respects 
than does the 500-plant trial. 

One result that is very clearly indicated is that much better results 
are obtained if the primary plots are taken across rows instead of down 
rows. This positioning of plots is confirmed by Sharpe and Blackmon 
(6) with pecans and Taylor (8) with strawberries in England. 

Table 5 indicates that real differences may not be quite as detectable 
as is usually indicated. This is possibly due to the nonrandomness of 
soil and plant variations. 

The correlation between near plants is so low as to be practically 
zero. 


SUMMARY 


Two uniformity yield trials of 200 individual plant yields and 500 
individual plant yields are presented. The 500-plant trial was some- 
what more variable as measured by the ratio of standard deviation to 
the mean. 

F-values are given for certain randomized block designs with random 
assignment of “varieties”. These F-values indicate that the trials are 
different and that some plot arrangements are better than others. In 
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particular primary plots running across rows are much better than 
primary plots running down rows. 

F-values for latin square design indicate a possible tendency to 
find differences when they do not exist. 

Correlations between plants close together are low. 
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ANALYSIS OF A TRIPLE RECTANGULAR LATTICE DESIGN 


V. N. Murty 
Central Tobacco Research Institute, India 


Harshbarger (1946, 1947, 1949) developed a set of incomplete 
block designs in which the number of varieties (or treatments) is the 
product of two consecutive integers and the number of replications 
of each variety is either 2 or 3 and their multiples. He called them 
simple and triple rectangular lattices respectively. 

K. R. Nair (1951) has shown that a Triple Rectangular Lattice 
when v = 3 X 4 is a p.b.i.b. (partially balanced incomplete block) 
design with m equal to 3. In the June 1952 issue of Biometrics he 


has given the analysis of Simple Square and Rectangular Lattices. 

The object of the present note is to give a numerical illustration of 

the Analysis of Triple Rectangular Lattice Design following the methods 

of K. R. Nair. For the purpose the data for illustration have been 

taken from p. 295 of Experimental Designs by Cochran and Cox. 

This example is specially interesting as the Intra Block error is zero. 
The parameters of the design are: 


v = 12, 


Px = 33 0 OF. 
2.3 


— ves, 6-2, 
n, = 6, n, = 3, nz = 2, 
22 1). 402 330 
110 20 0 
The matrix | 
6-1 
& 3 1 9 
422 
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The determinant of the matrix 
A = 360. 


The twelve varieties may be represented schematically. 


1 2 3 
Xx 124 132 143 
4 5 6 
213 x 234 241 
7 8 9 
314 321 X 342 
10 11 12 
412 423 431 


This scheme enables one to write down quickly the first, second 
and third associates of any given variety. 

The total S.S. and the Block 8.8. (ignoring varieties) are calculated 
in the usual manner. The varieties 8.S. (eliminating blocks), is calcu- 
lated from the table given below. 


TABLE 1. INTRA BLOCK ESTIMATES OF VARIETAL EFFECTS 


Variety 

No. | (6) | (61) (e1)} (52) | (ee) | (53) (es)| | ©} 
55 | 123 42 |2,3,8,11,5,7 | 2580)18.67 
2. 34 94 8 34/4,8 12 60}11.67 
3. 18} 87 | —33 |1,2,6,9,4,11 | 43/5,8,10|—15]7,12) 5|/—1740| 6.67 
4. 6 49 | —31 |5,6,7,10,3,11)—57|1,9,12) 37/2,8 51| —2460| 4.67 
5. 20} 89 | —29 |4,6,2,12,1,7 | 55/3,8,10|—19|9,11] —7|—1380| 7.67 
6. 46 | 107 31 |4,5,3,9,8,12 |—55)2,7,11] 13) 1500/15.67 
47 | 129 12 |8,9,4,10,1,5 | —2)2,6,11) 30)3,12!|—40| 780/13.67 
8. 62 | 143 43 |7,9,1,11,6,12) |—23] 3300|20.67 
9. 48 | 142 2 |7,8,3,6,2,10 32/1 ,4,12 420\12 .67 
10 22 95 | —29 |—19]1,6 | 73]/—2100] 5.67 
ll 25} 84] —9 | 5115,9 |—27| —660| 9.67 
12 31 | 100 | —7 |10,11,2,5,6,8) 15]1,4,9 | 138]3,7 |—21| —300|10.67 

Total |414 |1242 0 0 0 0 0 
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(a) = Total yield of the variety. 

(8) = Total of the Blocks in which the variety occurs. 

(vy) = k. (a) — 

(5,) = First associates of the variety. 

(e.) = Sum of (7) for varieties in (6,) 

(5.) = Second associates of the variety. 

(e.) = Sum of (7) for varieties in (6,) 

(53) = Third associates of the variety. 

(es) = Sum of (7) for varieties in (6;) 

(n) [BasCss B3;C 23] (y) Bis [Css (e:) C23 (€2)] 
+ Cis [Bas (€2)] 

(&) = (n)/A + Grand Mean = zane varietal mean 


S.S. due to varieties _ D0 (y)(n) _ = 509.33 
(eliminating blocks) - A = 980 


TABLE 2. ANALYSIS OF VARIANCE 


Source DF. | SS. | SS. Source 
Blocks (ignoring varieties) 11 | 761.67} 204.00 Blocks (eliminating varie- 
Varieties (eliminating blocks)} 11 | 509.33/1067.00 Be (ignoring blocks) 
Intra-Block error 13 0.00} 0.00} Intra-Block error. 
Total 35 |1271.00|1271.00 
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QUERIES 


GeorcE W. SNeEpEcoR, Editor 


QUERY: Comment on Query No. 96, Vol. 8, December 1952, 
103 Page 383. By C. B. Williams, Department of Entomology, 
Rothamsted Experimental Station. 

“T find myself in disagreement with the comments of Professor 
Snedecor in the problem of a ‘missing-plot’ in an experiment on the 
attraction of insects to bait traps, as discussed by him in Biometrics. 

“The submittor of the problem had made 12 repetitions of 4 different 
baits as attractants for fruit flies, the number of insects caught per trap 
per repetition varying from 3 to 75. In one repetition one bait was 
missing, and when the usual formula was employed to find a replacement 
for this the number came to —6.64!! The questioner asked ‘How do 
you explain this? Have we made a mistake?’ 

“Professor Snedecor’s comment was that no error had been made in 
the calculation, and that the value obtained ‘is not intended as an 
estimate of the missing datum. It is merely a number to be put in 
the empty space so that you can conveniently extract the remaining 
information from the experiment’. 

“This may be good mathematics but it is biologically meaningless. 
The repetition in question had very low values for the other three baits 
(6, 16 and 23), and the missing bait was on an average the poorest of 
the four, so that had a proper figure been obtained experimentally it 
would probably have been between 0 and 10. But to suggest that any 
increased accuracy (which is surely the object of mathematical reasoning) 
is obtained by putting a negative value, cannot possibly be upheld. 

“When an absurd result is obtained by mathematical reasoning 
either the mathematics is wrong or else it is based on false premises. 
In the case under consideration the fundamental assumption has been 
made by the experimenter—and accepted (with one qualification) by 
Professor Snedecor—that the differences between baits within a single 
repetition, and the differences between repetitions of a single bait, 
can be soundly based on an arithmetic scale. In other words it is as- 
sumed that a particular bait tends to attract a fixed number n more 
(or less) insects than another bait irrespective of the general level of 
catch. 

“Much experience in problems of comparing changes in insect 
numbers, particularly as caught in traps, has however convinced me 
that differences between catches are usually in similar ratio or pro- 
portion—a particular trap tends to catch n per cent more or less than 
another. Once this is recognised it follows that such data can only be 
soundly compared on a geometric, or logarithmic scale. 


“Details of this work will be found in the following publications: 
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Philos. Trans. B. 226, 361(1936); Ann. Appl. Biol. 24(2):404 (1937); 
Trans. R. ent. Soc. Lond. 90:234-239 (1940) and Bull. ent. Res. 
42(3):513-517 (1951). The last reference deals specifically with the 
problem of comparison of traps with different attractants. 

“To apply the method to the problem under discussion the number 
of insects caught in each bait repetition must be converted to a logarithm, 
and if this is given to two decimal places an accuracy of about 2% is 
obtained. The original table as quoted by Snedecor then becomes as 
shown below— 


BAITS 
Repetitions A B C D Total 
1 0.90 1.88 1.61 1.67 6.06 
2 0.70 1.72 1.48 1.46 5.36 
3 0.90 1.76 1.58 1.40 5.64 
4 0.48 1.28 1.30 1.00 4.06 
5 1.23 1.61 1.54 1.90 6.28 
6 1.18 1.53 1.45 1.11 5.27 
7 0.95 1.69 1.38 1.08 5.10 
8 1.28 1.75 1.36 1.26 5.65 
9 — 1.20 0.78 1.36 (3.34) 
10 1.30 1.52 1.42 1.23 5.47 
11 0.85 1.40 1.18 1.08 4.51 
12 1.32 1.82 1.59 1.32 6.05 
Total (11.09) 19.16 16.67 15.87 (63.45) 
“The missing value is given by 
4(11.09 12(3.34) — 63.45 
x A(11.09) + 12.34) 


33 

“The anti-log. of this is 4.6 which is the estimated number of insects 

likely to have been caught at bait A in the 9th repetition. This esti- 
mate is both mathematically sound and biologically reasonable. 

“To continue the analysis it follows that, after incorporating the 
missing term, the mean log. catches per repetition with the four baits are:— 
0.98 : 1.60 : 1.39 and 1.32 

so that the geometric mean catches are:— 
9.5; 39.8; 24.6 and 20.9 


The corresponding arithmetic means on the original analysis (including 
the missing term) are: 
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12.5; 43.7; 27.1 and 25.5 


These two sets of numbers appear at first sight to be somewhat simi- 
lar, but their interpretation is very different. The geometric technique 
implies that if trap A gets z insects, then trap B will get 4.2z, trap C 2.6z 
and trap D 2.2x. But the arithmetic method implies that if trap A gets 
z insects then trap B will get x + 31, trap C will get x + 15, and trap D 
z+ 13.” 

On one point I heartily agree with Dr. Williams: the occur- 
ANSWER: _ rence of any unusual feature, such as the negative value 

of the missing plot, should cause the investigator to 
re-examine both his theory and his experimental procedure. On other 
points my views are less extreme. 

As intimated in the original answer, there are logically satisfying 
reasons for making a transformation. The counts of insects may 
follow Poisson distributions in which the variances change with the 
means. The unequal variances vitiate both the estimates and the 
tests of hypotheses. An appropriate transformation alleviates the 
difficulty. But no conclusion would be changed in the experiment 
under discussion. In my experience this situation has occurred so 
often that I tend to be a bit lax about transformations in this type of 
data unless close decisions are anticipated. 

As for a negative “‘missing value”, I do not consider it absurd nor as 
necessarily indicating anything wrong with either the model or the data. 
It may arise merely from the incidents of sampling. If the model is 
appropriate and if the experiment is successfully performed, the use 
of the missing plot technique, with well known corrections for bias, 
leads to unbiased estimates of the treatment means and to unbiased 
tests, the sign of the missing value being irrelevant. As we have often 
emphasized, these estimates and tests can be arrived at without the 
calculation of a missing value; it is merely a device for making available 
the more familiar forms of calculation. True, a negative value would 
not ordinarily have biological meaning but, in my experience, there is 
nothing to be learned from assigning a meaning to it; usually it is only 
the treatment means and a measure of experimental error that are 
needed and these may be estimated correctly irrespective of the sign 
of the missing value. 

It happens then that in this particular experiment the tests of 
hypotheses lead to the same conclusions with or without the trans- 
formations. But, because of the unequal weightings associated with 
the counts, the treatment means are doubtless better estimated from 
the transformed data. 
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232 D. J. FINNEY. (Indian Council of Agricultural Research, New 
Delhi.) The Choice of Levels. 


In this paper were outlined considerations involved in the choice of 
numerical levels of a quantitative factor to be used in an experiment. 
The choice must depend upon the purpose of the experiment and the 
extent of existing knowledge. The principles were illustrated by refer- 
ence to the choice of doses for a slope-ratio biological assay and the 
choice of levels of fertilizer to be used in experiments for determining the 
economic optimal dressing. 


233 K. R. NAIR. (Forest Research Institute, Dehra Dun.) Some 
Unsolved Problems in Experimenta! Designs. 


The paper points out an unsolved problem in the construction of 
optimum confounded designs of an s, X 8, factorial experiment in blocks 
of s, plots where s, and s, are mutually prime and s, < s,. It also cites 
general problems yet to be solved in extending the results to the general 
asymmetrical factorial experiment involving more than two factors and 
expresses the need for fundamental investigations into incomplete fac- 
torial experiments, sequential experimentation, etc. 


K. KISHEN. (Uttar Pradesh Statistician, Department of Agri- 
234 culture, Lucknow.) Some Unsolved Problems in Experimental 
Designs. 

Dr. K. Kishen gave an account of his investigations into the problem 
of constructing generalized asymmetrical factorial designs. He gave 
two general solutions, but said that the class of designs he had investi- 
gated were not optimum designs and that there was need for investigating 
such designs. 


A. ANANTHAPADMANABHA RAU. (Mysore State Agricul- 
235 tural Department, Bangalore.) Some Difficulties in Experimen- 
tation with Coffee—A Plantation Crop. 


A plantation crop like coffee offers difficulties additional to those 
offered by annual crops to a Field Experimenter. High variability in 
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yields between plants in the same year, year to year variability in yields 
of the same plant, in addition to the correlation between yields of suc- 
cessive years and the cyclic trends, are mainly responsible for field ex- 
periments with coffee being, in general, unsuccessful in giving significant 
results. Exhaustive study of yield data on coffee in uniformity trial 
and manurial trials has pointed out the necessity for a new approach to 
this problem. Inconsistent results are obtained by the application of 
usual methods of analysis of variance and orthogonal polynomials— 
frequently applied to data with trends—as these do not satisfactorily 
represent the trends present in coffee yield data. Application of other 
methods has given some significant results regarding the variability and 
the manurial effects. These are discussed. 


236 V. G. PANSE. (Indian Council of Agricultural Research, New 
Delhi.) Long Term Experiments. 


The paper outlined the general principles governing the design of 
long term experiments in agriculture and described a rotational cum 
manurial trial in progress at the Institute of Plant Industry, Indore, 
since 1947, The experiment involves comparison of two crop rotations 
and the response of the rotation crops to three types of nitrogen, four 
frequencies of its application and application of phosphate. The experi- 
ment is laid out in two replications, each consisting of seven crop plots, 
three for one rotation and four for the other. In each crop-plot there are 
24 main plots for nitrogen and frequency of application treatments, and 
each main plot is sub-divided into three sub-plots for phosphate treat- 
ment. The experiment thus includes a total of 1008 sub-plots and is 
planned to run for 12 years in the first instance. The results obtained so 
far showed that cotton following groundnut in the second rotation gives 
a substantially higher yield than when it follows a millet in the first 
rotation and that the response of crops to nitrogen from ammonium 
sulphate and oilcake is distinctly superior to that from farmyard manure. 


237 UTTAM CHAND. (Indian Council of Agricultural Research, 
New Delhi.) Experiments on Tree Crops. 


The paper presents a brief review of some of the experiments which 
are being done in India to determine the manurial requirements of coco- 
nut, citrus, cardamom and pepper. In drawing up manurial and cultural 
experimental programmes on tree crops flexibility of any recommended 
design with a view to introduce other factors in the future which are 
orthogonal to the first set should receive particular attention. With this 
end in view asymmetrical designs should be avoided as far as possible. 


it 
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The paper makes a plea for more intensive agricultural research to 
determine the rate at which manurial doses should be increased with 
advancing age of the trees. 


238 V. N. AMBLE. (Indian Council of Agricultural Research, ’e~ 
Delhi.) Animal Experiments. 

The paper described the salient features of switchback and switchover 
trials which are of use in animal experiments and indicated how the 
extension of the switchback trial to an even number of periods beyond 
two would permit the examination of the presence of the carryover effect 
even in cases where only two sets of animals are allotted to the two 
switchback schedules. Where, however, carryover effect is initially 
suspected and estimation of purely direct effect is desired, two additional 
sets of animals put on the two treatments continuously would_be 
necessary. 


